Only originalRDD is cached. You need to call cache/persist for every RDD you want cached.

February 19, 2014 at 10:03 PM
When I persist/cache an RDD, are all the derived RDDs cached as well or do I need to  call cache individually on each RDD if I need them to be cached?

For ex:

val originalRDD = sc.parallelize(...)
originalRDD.cache
val derivedRDD = originalRDD.map()

Is derivedRDD cached in this case?

Reply via email to