thanks for your help,  Davidson!
i modified
val a:RDD[Int] = sc.parallelize(array).cache()
to keep "val a" an RDD of Int, but has the same result

another question
JVM and spark memory locate at different parts of system memory, the spark
code is executed in JVM memory, malloc operation like val e = new
Array[Int](2*size) /*8MB*/ use JVM memory. if not cached, generated RDDs are
writed back to disk, if cached, RDDs are copied to spark memory for further
use, is that
right?

val RDD_1 = RDD_0.groupByKey{...}
shuffle separate stages, can anyone tell me the memory/disk usage of shuffle
input  RDD and shuffle output RDD under the condition that RDD_0, RDD_1 is
cached or not? 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/storage-MemoryStore-estimated-size-7-times-larger-than-real-tp4251p4256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to