Dear Experts,

we have a spark cluster (standalone mode) in which master and workers are started from root account. Everything runs correctly to the point when we try doing operations such as

    dataFrame.select("name", "age").save(ofile, "parquet")

or

    rdd.saveAsPickleFile(ofile)

, where ofile is path on a network exported filesystem (visible on all nodes, in our case this is lustre, I guess on nfs effect would be similar).

Unsurprisingly temp files created on workers are owned by root, which then leads to a crash (see [1] below). Is there a solution/workaround for this (e.g. controlling file creation mode of the temporary files)?

Cheers,
 Tomasz


ps I've tried to google this problem, couple of similar reports, but no clear answer/solution found

ps2 For completeness - running master/workers as a regular user solves the problem only for the given user. For other users submitting to this master the result is given in [2] below


[0] Cluster details:
Master/workers: centos 6.5
Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the 2.6 build)


[1]
##################################################################
File "/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o27.save.
: java.io.IOException: Failed to rename DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet; isDirectory=false; length=534; replication=1; blocksize=33554432; modification_time=1432042832000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43) at org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
        at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
        at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)
##################################################################



[2]
##################################################################
15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, wn23023.cis.gov.pl): java.io.IOException: Mkdirs failed to create file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
        at parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
##################################################################

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to