Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Tomasz Fruboes Wed, 20 May 2015 02:42:00 -0700

Thanks for a suggestion. I have tried playing with it, sc.sparkUser()gives me expected user name, but it doesnt solve the problem. From aquick search through the spark code it seems to me, that this setting iseffective only for yarn and mesos.

I think the workaround for the problem could be using "--deploy-modecluster" (not 100% convenient, since disallows any interactive work),but this is not supported for python based programs.


Cheers,
  Tomasz



W dniu 20.05.2015 o 10:57, Iulian Dragoș pisze:

You could try setting `SPARK_USER` to the user under which your workers
are running. I couldn't find many references to this variable, but at
least Yarn and Mesos take it into account when spawning executors.
Chances are that standalone mode also does it.

iulian

On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes
<tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>> wrote:

    Hi,

      thanks for answer. The rights are

    drwxr-xr-x 3 tfruboes all 5632 05-19 15 <tel:5632%2005-19%2015>:40
    test19EE/

      I have tried setting the rights to 777 for this directory prior to
    execution. This does not get propagated down the chain, ie the
    directory created as a result of the "save" call
    (namesAndAges.parquet2 in the path in the dump [1] below) is created
    with the drwxr-xr-x rights (owned by the user submitting the job, ie
    tfruboes). The temp directories created inside

    namesAndAges.parquet2/_temporary/0/

    (e.g. task_201505200920_0009_r_000001) are owned by root, again with
    drwxr-xr-x access rights

      Cheers,
       Tomasz

    W dniu 19.05.2015 o 23:56, Davies Liu pisze:

        It surprises me, could you list the owner information of
        /mnt/lustre/bigdata/med_home/tmp/test19EE/ ?

        On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
        <tomasz.frub...@fuw.edu.pl <mailto:tomasz.frub...@fuw.edu.pl>>
        wrote:

            Dear Experts,

               we have a spark cluster (standalone mode) in which master
            and workers are
            started from root account. Everything runs correctly to the
            point when we
            try doing operations such as

                  dataFrame.select("name", "age").save(ofile, "parquet")

            or

                  rdd.saveAsPickleFile(ofile)

            , where ofile is path on a network exported filesystem
            (visible on all
            nodes, in our case this is lustre, I guess on nfs effect
            would be similar).

               Unsurprisingly temp files created on workers are owned by
            root, which then
            leads to a crash (see [1] below). Is there a
            solution/workaround for this
            (e.g. controlling file creation mode of the temporary files)?

            Cheers,
               Tomasz


            ps I've tried to google this problem, couple of similar
            reports, but no
            clear answer/solution found

            ps2 For completeness - running master/workers as a regular
            user solves the
            problem only for the given user. For other users submitting
            to this master
            the result is given in [2] below


            [0] Cluster details:
            Master/workers: centos 6.5
            Spark 1.3.1 prebuilt for hadoop 2.4 (same behaviour for the
            2.6 build)


            [1]
            ##################################################################
                 File
            
"/mnt/home/tfruboes/2015.05.SparkLocal/spark-1.3.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
            line 300, in get_return_value
            py4j.protocol.Py4JJavaError: An error occurred while calling
            o27.save.
            : java.io.IOException: Failed to rename
            
DeprecatedRawLocalFileStatus{path=file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/_temporary/0/task_201505191540_0009_r_000001/part-r-00002.parquet;
            isDirectory=false; length=534; replication=1;
            blocksize=33554432;
            modification_time=1432042832000; access_time=0; owner=; group=;
            permission=rw-rw-rw-; isSymlink=false} to
            
file:/mnt/lustre/bigdata/med_home/tmp/test19EE/namesAndAges.parquet2/part-r-00002.parquet
                      at
            
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
                      at
            
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
                      at
            
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
                      at
            
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
                      at
            
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:690)
                      at
            
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
                      at
            
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
                      at
            org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
                      at
            org.apache.spark.sql.DataFrame.save(DataFrame.scala:1181)
                      at
            sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                      at
            
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                      at
            
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                      at java.lang.reflect.Method.invoke(Method.java:606)
                      at
            py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
                      at
            py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
                      at py4j.Gateway.invoke(Gateway.java:259)
                      at
            py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
                      at
            py4j.commands.CallCommand.execute(CallCommand.java:79)
                      at
            py4j.GatewayConnection.run(GatewayConnection.java:207)
                      at java.lang.Thread.run(Thread.java:745)
            ##################################################################



            [2]
            ##################################################################
            15/05/19 14:45:19 WARN TaskSetManager: Lost task 0.0 in
            stage 2.0 (TID 3,
            wn23023.cis.gov.pl <http://wn23023.cis.gov.pl>):
            java.io.IOException: Mkdirs failed to create
            
file:/mnt/lustre/bigdata/med_home/tmp/test18/namesAndAges.parquet2/_temporary/0/_temporary/attempt_201505191445_0009_r_000000_0
                      at
            
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
                      at
            
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
                      at
            org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
                      at
            org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
                      at
            org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
                      at
            parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154)
                      at
            
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279)
                      at
            
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
                      at
            org.apache.spark.sql.parquet.ParquetRelation2.org
            
<http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
                      at
            
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
                      at
            
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
                      at
            org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
                      at org.apache.spark.scheduler.Task.run(Task.scala:64)
                      at
            
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
                      at
            
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                      at
            
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                      at java.lang.Thread.run(Thread.java:745)
            ##################################################################

            
---------------------------------------------------------------------
            To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
            <mailto:user-unsubscr...@spark.apache.org>
            For additional commands, e-mail: user-h...@spark.apache.org
            <mailto:user-h...@spark.apache.org>



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>




--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com <http://www.typesafe.com>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Reply via email to