I think it depends on your workload and executor distribution, if your
workload is evenly distributed without any big data skew, and executors are
evenly distributed on each nodes, the storage usage of each node is nearly
the same. Spark itself cannot rebalance the storage overhead as you
mentioned.

2015-05-06 21:09 GMT+08:00 Yifan LI <iamyifa...@gmail.com>:

> Thanks, Shao. :-)
>
> I am wondering if the spark will rebalance the storage overhead in
> runtime…since still there is some available space on other nodes.
>
>
> Best,
> Yifan LI
>
>
>
>
>
> On 06 May 2015, at 14:57, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> I think you could configure multiple disks through spark.local.dir,
> default is /tmp. Anyway if your intermediate data is larger than available
> disk space, still will meet this issue.
>
> spark.local.dir/tmpDirectory to use for "scratch" space in Spark,
> including map output files and RDDs that get stored on disk. This should be
> on a fast, local disk in your system. It can also be a comma-separated list
> of multiple directories on different disks. NOTE: In Spark 1.0 and later
> this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>
> 2015-05-06 20:35 GMT+08:00 Yifan LI <iamyifa...@gmail.com>:
>
>> Hi,
>>
>> I am running my graphx application on Spark, but it failed since there is
>> an error on one executor node(on which available hdfs space is small) that
>> “no space left on device”.
>>
>> I can understand why it happened, because my vertex(-attribute) rdd was
>> becoming bigger and bigger during computation…, so maybe sometime the
>> request on that node was too bigger than available space.
>>
>> But, is there any way to avoid this kind of error? I am sure that the
>> overall disk space of all nodes is enough for my application.
>>
>> Thanks in advance!
>>
>>
>>
>> Best,
>> Yifan LI
>>
>>
>>
>>
>>
>>
>
>

Reply via email to