Hi,
u're right; rdd3 is not totally cached and it is re-computed every time.
If MEMORY_AND_DISK, rdd3 is written to disk.
Also, the current Spark does not automatically unpersist rdds depends
on frequency of use.
On Fri, Feb 5, 2016 at 12:15 PM, charles li wrote:
> say I have 2 RDDs, RDD1 and
say I have 2 RDDs, RDD1 and RDD2.
both are 20g in memory.
and I cache both of them in memory using RDD1.cache() and RDD2.cache()
the in the further steps on my app, I never use RDD1 but use RDD2 for lots
of time.
then here is my question:
if there is only 40G memory in my cluster, and here I