You should use `df.cache()` `df.rdd.cache()` won't work, because `df.rdd` generate a new RDD from the original `df`. and then cache the new RDD.
On Fri, Oct 13, 2017 at 3:35 PM, Supun Nakandala <supun.nakand...@gmail.com> wrote: > Hi all, > > I have been experimenting with cache/persist/unpersist methods with > respect to both Dataframes and RDD APIs. However, I am experiencing > different behaviors Ddataframe API compared RDD API such Dataframes are not > getting cached when count() is called. > > Is there a difference between how these operations act wrt to Dataframe > and RDD APIs? > > Thank You. > -Supun >