Could you file a JIRA for this? The executor should run under the user who submit a job, I think.
On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes <tomasz.frub...@fuw.edu.pl> wrote: > Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives > me expected user name, but it doesnt solve the problem. From a quick search > through the spark code it seems to me, that this setting is effective only > for yarn and mesos. > > I think the workaround for the problem could be using "--deploy-mode > cluster" (not 100% convenient, since disallows any interactive work), but > this is not supported for python based programs. > > Cheers, > Tomasz > > > > W dniu 20.05.2015 o 10:57, Iulian DragoČ™ pisze: >> >> You could try setting `SPARK_USER` to the user under which your workers >> are running. I couldn't find many references to this variable, but at >> least Yarn and Mesos take it into account when spawning executors. >> Chances are that standalone mode also does it. >> >> iulian >> >> On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes >> <tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>> wrote: >> >> Hi, >> >> thanks for answer. The rights are >> >> drwxr-xr-x 3 tfruboes all 5632 05-19 15 <tel:5632%2005-19%2015>:40 >> test19EE/ >> >> I have tried setting the rights to 777 for this directory prior to >> execution. This does not get propagated down the chain, ie the >> directory created as a result of the "save" call >> (namesAndAges.parquet2 in the path in the dump [1] below) is created >> with the drwxr-xr-x rights (owned by the user submitting the job, ie >> tfruboes). The temp directories created inside >> >> namesAndAges.parquet2/_temporary/0/ >> >> (e.g. task_201505200920_0009_r_000001) are owned by root, again with >> drwxr-xr-x access rights >> >> Cheers, >> Tomasz >> >> W dniu 19.05.2015 o 23:56, Davies Liu pisze: >> >> It surprises me, could you list the owner information of >> /mnt/lustre/bigdata/med_home/tmp/test19EE/ ? >> >> On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes >> <tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>> >> >> wrote: >> >> Dear Experts, >> >> we have a spark cluster (standalone mode) in which master >> and workers are >> started from root account. Everything runs correctly to the >> point when we >> try doing operations such as >> >> dataFrame.select("name", "age").save(ofile, "parquet") >> >> or >> >> rdd.saveAsPickleFile(ofile) >> >> , where ofile is path on a network exported filesystem >> (visible on all >> nodes, in our case this is lustre, I guess on nfs effect >> would be similar). >> >> Unsurprisingly temp files created on workers are owned by >> root, which then >> leads to a crash (see [1] below). Is there a >> solution/workaround for this >> (e.g. controlling file creation mode of the temporary files)? >> >> Cheers, >> Tomasz >> >> >> ps I've tried to google this problem, couple of similar >> reports, but no >> clear answer/solution found >> >> ps2 For completeness - running master/workers as a regular >> user solves the >> problem only for the given user. For other users submitting >> to this master >> the result is given in [2] below >> >> >> [0] Cluster details: >> Master/workers: centos 6.5 >> Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the >> 2.6 build) >> >> >> [1] >> >> ################################################################## >> File >> >> "/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", >> line 300, in get_return_value >> py4j.protocol.Py4JJavaError: An error occurred while calling >> o27.save. >> : java.io.IOException: Failed to rename >> >> DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet; >> isDirectory=false; length=534; replication=1; >> blocksize=33554432; >> modification_time=1432042832000; access_time=0; owner=; >> group=; >> permission=rw-rw-rw-; isSymlink=false} to >> >> file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet >> at >> >> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) >> at >> >> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) >> at >> >> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) >> at >> >> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43) >> at >> >> org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690) >> at >> >> org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129) >> at >> >> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) >> at >> org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196) >> at >> org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >> at >> >> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >> at py4j.Gateway.invoke(Gateway.java:259) >> at >> >> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >> at >> py4j.commands.CallCommand.execute(CallCommand.java:79) >> at >> py4j.GatewayConnection.run(GatewayConnection.java:207) >> at java.lang.Thread.run(Thread.java:745) >> >> ################################################################## >> >> >> >> [2] >> >> ################################################################## >> 15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in >> stage 2.0 (TID 3, >> wn23023.cis.gov.pl <http://wn23023.cis.gov.pl>): >> java.io.IOException: Mkdirs failed to create >> >> file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0 >> at >> >> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) >> at >> >> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) >> at >> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) >> at >> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) >> at >> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) >> at >> >> parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154) >> at >> >> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279) >> at >> >> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) >> at >> org.apache.spark.sql.parquet.ParquetRelation2.org >> >> <http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667) >> at >> >> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) >> at >> >> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) >> at >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> at >> org.apache.spark.scheduler.Task.run(Task.scala:64) >> at >> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >> at >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> ################################################################## >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: user-h...@spark.apache.org >> <mailto:user-h...@spark.apache.org> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: user-h...@spark.apache.org >> <mailto:user-h...@spark.apache.org> >> >> >> >> >> -- >> >> -- >> Iulian Dragos >> >> ------ >> Reactive Apps on the JVM >> www.typesafe.com <http://www.typesafe.com> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org