Hi Alessandro,

You can look for a log line like this in your driver's output:
15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local
directory at 
/data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d

If you're deploying your application in cluster mode, the temp
directory will be under the Yarn-defined application dir. In client
mode, the driver will create some stuff under spark.local.dir, but the
driver itself generally doesn't create many temp files IIRC.


On Fri, Jan 9, 2015 at 11:32 PM, Alessandro Baretta
<alexbare...@gmail.com> wrote:
> Gents,
>
> I'm building spark using the current master branch and deploying in to
> Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop
> cluster provisioning tool. bdutils configures Spark with
>
> spark.local.dir=/hadoop/spark/tmp,
>
> but this option is ignored in combination with YARN. Bdutils also configures
> YARN with:
>
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/mnt/pd1/hadoop/yarn/nm-local-dir</value>
>     <description>
>       Directories on the local machine in which to application temp files.
>     </description>
>   </property>
>
> This is the right directory for spark to store temporary data in. Still,
> Spark is creating such directories as this:
>
> /tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e
>
> and filling them up with gigabytes worth of output files, filling up the
> very small root filesystem.
>
> How can I diagnose why my Spark installation is not picking up the
> yarn.nodemanager.local-dirs from yarn?
>
> Alex



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to