I think you could configure multiple disks through spark.local.dir, default
is /tmp. Anyway if your intermediate data is larger than available disk
space, still will meet this issue.

spark.local.dir/tmpDirectory to use for "scratch" space in Spark, including
map output files and RDDs that get stored on disk. This should be on a
fast, local disk in your system. It can also be a comma-separated list of
multiple directories on different disks. NOTE: In Spark 1.0 and later this
will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS
(YARN) environment variables set by the cluster manager.

2015-05-06 20:35 GMT+08:00 Yifan LI <iamyifa...@gmail.com>:

> Hi,
>
> I am running my graphx application on Spark, but it failed since there is
> an error on one executor node(on which available hdfs space is small) that
> “no space left on device”.
>
> I can understand why it happened, because my vertex(-attribute) rdd was
> becoming bigger and bigger during computation…, so maybe sometime the
> request on that node was too bigger than available space.
>
> But, is there any way to avoid this kind of error? I am sure that the
> overall disk space of all nodes is enough for my application.
>
> Thanks in advance!
>
>
>
> Best,
> Yifan LI
>
>
>
>
>
>

Reply via email to