sorry, davidosn, i don't catch the point. what's the essential difference between our codes? /*my code*/ val array = new Array[Int](size) val a = sc.parallelize(array).cache() /*4MB*/
/*your code*/ val numSlices = 8 val arr = Array.fill[Array[Int]](numSlices) { new Array[Int](size / numSlices) } val rdd = sc.parallelize(arr, numSlices).cache() i'm in local mode, with only one partitions, it's just an RDD of one partition with the type RDD[Int] your RDD have 8 partitions with the type RDD[Array[Int]], do that matter? my question is why the memory usage is 7x in sbt, but right in spark shell? as to the following question, i made a mistake, sorry -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/storage-MemoryStore-estimated-size-7-times-larger-than-real-tp4251p4269.html Sent from the Apache Spark User List mailing list archive at Nabble.com.