Thanks for a suggestion. I have tried playing with it, sc.sparkUser()
gives me expected user name, but it doesnt solve the problem. From a
quick search through the spark code it seems to me, that this setting is
effective only for yarn and mesos.
I think the workaround for the problem could be using "--deploy-mode
cluster" (not 100% convenient, since disallows any interactive work),
but this is not supported for python based programs.
Cheers,
Tomasz
W dniu 20.05.2015 o 10:57, Iulian DragoČ™ pisze:
You could try setting `SPARK_USER` to the user under which your workers
are running. I couldn't find many references to this variable, but at
least Yarn and Mesos take it into account when spawning executors.
Chances are that standalone mode also does it.
iulian
On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes
<tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>> wrote:
Hi,
thanks for answer. The rights are
drwxr-xr-x 3 tfruboes all 5632 05-19 15 <tel:5632%2005-19%2015>:40
test19EE/
I have tried setting the rights to 777 for this directory prior to
execution. This does not get propagated down the chain, ie the
directory created as a result of the "save" call
(namesAndAges.parquet2 in the path in the dump [1] below) is created
with the drwxr-xr-x rights (owned by the user submitting the job, ie
tfruboes). The temp directories created inside
namesAndAges.parquet2/_temporary/0/
(e.g. task_201505200920_0009_r_000001) are owned by root, again with
drwxr-xr-x access rights
Cheers,
Tomasz
W dniu 19.05.2015 o 23:56, Davies Liu pisze:
It surprises me, could you list the owner information of
/mnt/lustre/bigdata/med_home/tmp/test19EE/ ?
On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
<tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>>
wrote:
Dear Experts,
we have a spark cluster (standalone mode) in which master
and workers are
started from root account. Everything runs correctly to the
point when we
try doing operations such as
dataFrame.select("name", "age").save(ofile, "parquet")
or
rdd.saveAsPickleFile(ofile)
, where ofile is path on a network exported filesystem
(visible on all
nodes, in our case this is lustre, I guess on nfs effect
would be similar).
Unsurprisingly temp files created on workers are owned by
root, which then
leads to a crash (see [1] below). Is there a
solution/workaround for this
(e.g. controlling file creation mode of the temporary files)?
Cheers,
Tomasz
ps I've tried to google this problem, couple of similar
reports, but no
clear answer/solution found
ps2 For completeness - running master/workers as a regular
user solves the
problem only for the given user. For other users submitting
to this master
the result is given in [2] below
[0] Cluster details:
Master/workers: centos 6.5
Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the
2.6 build)
[1]
##################################################################
File
"/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o27.save.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet;
isDirectory=false; length=534; replication=1;
blocksize=33554432;
modification_time=1432042832000; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
at
org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
at
org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at
py4j.commands.CallCommand.execute(CallCommand.java:79)
at
py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
##################################################################
[2]
##################################################################
15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in
stage 2.0 (TID 3,
wn23023.cis.gov.pl <http://wn23023.cis.gov.pl>):
java.io.IOException: Mkdirs failed to create
file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at
parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org
<http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
##################################################################
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
--
--
Iulian Dragos
------
Reactive Apps on the JVM
www.typesafe.com <http://www.typesafe.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org