That's correct, as long as you don't change the StorageLevel.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L166 Yong ________________________________ From: Rabin Banerjee <dev.rabin.baner...@gmail.com> Sent: Friday, November 18, 2016 10:36 AM To: user; Mich Talebzadeh; Tathagata Das Subject: Will spark cache table once even if I call read/cache on the same table multiple times Hi All , I am working in a project where code is divided into multiple reusable module . I am not able to understand spark persist/cache on that context. My Question is Will spark cache table once even if I call read/cache on the same table multiple times ?? Sample Code :: TableReader:: def getTableDF(tablename:String,persist:Boolean = false) : DataFrame = { val tabdf = sqlContext.table(tablename) if(persist) { tabdf.cache() } return tableDF } Now Module1:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction Module2:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction .... ModuleN:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction Will spark cache emp table once , or it will cache every time I am calling ?? Shall I maintain a global hashmap to handle that ? something like Map[String,DataFrame] ?? Regards, Rabin Banerjee