It surprises me, could you list the owner information of /mnt/lustre/bigdata/med_home/tmp/test19EE/ ?
On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes <tomasz.frub...@fuw.edu.pl> wrote: > Dear Experts, > > we have a spark cluster (standalone mode) in which master and workers are > started from root account. Everything runs correctly to the point when we > try doing operations such as > > dataFrame.select("name", "age").save(ofile, "parquet") > > or > > rdd.saveAsPickleFile(ofile) > > , where ofile is path on a network exported filesystem (visible on all > nodes, in our case this is lustre, I guess on nfs effect would be similar). > > Unsurprisingly temp files created on workers are owned by root, which then > leads to a crash (see [1] below). Is there a solution/workaround for this > (e.g. controlling file creation mode of the temporary files)? > > Cheers, > Tomasz > > > ps I've tried to google this problem, couple of similar reports, but no > clear answer/solution found > > ps2 For completeness - running master/workers as a regular user solves the > problem only for the given user. For other users submitting to this master > the result is given in [2] below > > > [0] Cluster details: > Master/workers: centos 6.5 > Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the 2.6 build) > > > [1] > ################################################################## > File > "/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o27.save. > : java.io.IOException: Failed to rename > DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet; > isDirectory=false; length=534; replication=1; blocksize=33554432; > modification_time=1432042832000; access_time=0; owner=; group=; > permission=rw-rw-rw-; isSymlink=false} to > file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43) > at > org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690) > at > org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129) > at > org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > ################################################################## > > > > [2] > ################################################################## > 15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3, > wn23023.cis.gov.pl): java.io.IOException: Mkdirs failed to create > file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0 > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) > at > parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) > at > org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667) > at > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) > at > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ################################################################## > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org