I think you could configure multiple disks through spark.local.dir, default is /tmp. Anyway if your intermediate data is larger than available disk space, still will meet this issue.
spark.local.dir/tmpDirectory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager. 2015-05-06 20:35 GMT+08:00 Yifan LI <iamyifa...@gmail.com>: > Hi, > > I am running my graphx application on Spark, but it failed since there is > an error on one executor node(on which available hdfs space is small) that > “no space left on device”. > > I can understand why it happened, because my vertex(-attribute) rdd was > becoming bigger and bigger during computation…, so maybe sometime the > request on that node was too bigger than available space. > > But, is there any way to avoid this kind of error? I am sure that the > overall disk space of all nodes is enough for my application. > > Thanks in advance! > > > > Best, > Yifan LI > > > > > >