thanks for your help, Davidson! i modified val a:RDD[Int] = sc.parallelize(array).cache() to keep "val a" an RDD of Int, but has the same result
another question JVM and spark memory locate at different parts of system memory, the spark code is executed in JVM memory, malloc operation like val e = new Array[Int](2*size) /*8MB*/ use JVM memory. if not cached, generated RDDs are writed back to disk, if cached, RDDs are copied to spark memory for further use, is that right? val RDD_1 = RDD_0.groupByKey{...} shuffle separate stages, can anyone tell me the memory/disk usage of shuffle input RDD and shuffle output RDD under the condition that RDD_0, RDD_1 is cached or not? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/storage-MemoryStore-estimated-size-7-times-larger-than-real-tp4251p4256.html Sent from the Apache Spark User List mailing list archive at Nabble.com.