Faced with an issue where Spark temp files get filled under /opt/spark-1.2.1/tmp on the local filesystem on the worker nodes. Which parameter/configuration sets that location?
The spark-env.sh file has the folders set as, export SPARK_HOME=/opt/spark-1.2.1 export SPARK_WORKER_DIR=$SPARK_HOME/tmp I could not find any parameter SPARK_TMP_DIR under stock Spark documentation. DataStax did talk something about it. Once I know where this can be set I'm thinking of pointing that location to an NFS mounted location so that more space can be used without having the fear of jobs failing due to space running out. I could also see 'java.io.tmpdir' getting set to the below in the spark-env.sh. -Djava.io.tmpdir=/opt/spark-1.2.1/tmp/spark-tmp Tried setting it as, export _JAVA_OPTIONS=-Djava.io.tmpdir=/new/tmp/dir Did a grep again to see if it has picked up. It should have shown something like, Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/new/tmp/dir in the /opt/spark-1.2.1/logs/spark-hdplab-org.apache.spark.deploy.master.Master-1-node-01.out log. But is still seen pointing to /opt/spark-1.2.1/tmp only. $ export _JAVA_OPTIONS=-Djava.io.tmpdir=/new/tmp/dir $ grep -iR "/opt/spark-1.2.1/tmp" /opt/spark-1.2.1/* /opt/spark-1.2.1/logs/spark-hdplab-org.apache.spark.deploy.master.Master-1-node-01.out:Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/opt/spark-1.2.1/tmp/spark-tmp Also, I was wondering if setting setting SPARK_DAEMON_JAVA_OPTS which is used by Spark to set JVM options will help here even though http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts talks about this in a Spark standalone context. --- Mufeed Usman My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal <http://mufeed.livejournal.com>