AFAIK cache() is just a shortcut to the persist method with "MEMORY_ONLY" as storage level..
from the source code of RDD: > /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ > def persist(): RDD[T] = persist(StorageLevel.MEMORY_ONLY) > > /** Persist this RDD with the default storage level (`MEMORY_ONLY`). */ > def cache(): RDD[T] = persist() > 2014-04-13 16:26 GMT+02:00 Joe L <selme...@yahoo.com>: > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-difference-between-persist-and-cache-tp4181.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >