Gents,
I'm building spark using the current master branch and deploying in to
Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop
cluster provisioning tool. bdutils configures Spark with
spark.local.dir=/hadoop/spark/tmp,
but this option is ignored in combination with YARN. Bdutils also
configures YARN with:
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mnt/pd1/hadoop/yarn/nm-local-dir</value>
<description>
Directories on the local machine in which to application temp files.
</description>
</property>
This is the right directory for spark to store temporary data in. Still,
Spark is creating such directories as this:
/tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e
and filling them up with gigabytes worth of output files, filling up the
very small root filesystem.
How can I diagnose why my Spark installation is not picking up the
yarn.nodemanager.local-dirs from yarn?
Alex