Yes, you are right. For now I have to say the workload/executor is distributed 
evenly…so, like you said, it is difficult to improve the situation.

However, have you any idea of how to make a *skew* data/executor distribution? 



Best,
Yifan LI





> On 06 May 2015, at 15:13, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> 
> I think it depends on your workload and executor distribution, if your 
> workload is evenly distributed without any big data skew, and executors are 
> evenly distributed on each nodes, the storage usage of each node is nearly 
> the same. Spark itself cannot rebalance the storage overhead as you mentioned.
> 
> 2015-05-06 21:09 GMT+08:00 Yifan LI <iamyifa...@gmail.com 
> <mailto:iamyifa...@gmail.com>>:
> Thanks, Shao. :-)
> 
> I am wondering if the spark will rebalance the storage overhead in 
> runtime…since still there is some available space on other nodes.
> 
> 
> Best,
> Yifan LI
> 
> 
> 
> 
> 
>> On 06 May 2015, at 14:57, Saisai Shao <sai.sai.s...@gmail.com 
>> <mailto:sai.sai.s...@gmail.com>> wrote:
>> 
>> I think you could configure multiple disks through spark.local.dir, default 
>> is /tmp. Anyway if your intermediate data is larger than available disk 
>> space, still will meet this issue.
>> 
>> spark.local.dir      /tmp    Directory to use for "scratch" space in Spark, 
>> including map output files and RDDs that get stored on disk. This should be 
>> on a fast, local disk in your system. It can also be a comma-separated list 
>> of multiple directories on different disks. NOTE: In Spark 1.0 and later 
>> this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS 
>> (YARN) environment variables set by the cluster manager.
>> 
>> 2015-05-06 20:35 GMT+08:00 Yifan LI <iamyifa...@gmail.com 
>> <mailto:iamyifa...@gmail.com>>:
>> Hi,
>> 
>> I am running my graphx application on Spark, but it failed since there is an 
>> error on one executor node(on which available hdfs space is small) that “no 
>> space left on device”.
>> 
>> I can understand why it happened, because my vertex(-attribute) rdd was 
>> becoming bigger and bigger during computation…, so maybe sometime the 
>> request on that node was too bigger than available space.
>> 
>> But, is there any way to avoid this kind of error? I am sure that the 
>> overall disk space of all nodes is enough for my application.
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> Best,
>> Yifan LI
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

Reply via email to