sorry, davidosn, i don't catch the point. what's the essential difference
between our codes?
/*my code*/
val array = new Array[Int](size)
val a = sc.parallelize(array).cache() /*4MB*/

/*your code*/
val numSlices = 8
val arr = Array.fill[Array[Int]](numSlices) { new Array[Int](size /
numSlices) }
val rdd = sc.parallelize(arr, numSlices).cache()

i'm in local mode, with only one partitions, it's just an RDD of one
partition with the type RDD[Int]
your RDD have 8 partitions with the type RDD[Array[Int]], do that matter?
my question is why the memory usage is 7x in sbt, but right in spark shell?

as to the following question, i made a mistake, sorry



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/storage-MemoryStore-estimated-size-7-times-larger-than-real-tp4251p4269.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to