Hi All , I am working in a project where code is divided into multiple reusable module . I am not able to understand spark persist/cache on that context.
My Question is Will spark cache table once even if I call read/cache on the same table multiple times ?? Sample Code :: TableReader:: def getTableDF(tablename:String,persist:Boolean = false) : DataFrame = { val tabdf = sqlContext.table(tablename) if(persist) { tabdf.cache() } return tableDF } Now Module1:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction Module2:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction .... ModuleN:: val emp = TableReader.getTable("employee") emp.someTransformation.someAction Will spark cache emp table once , or it will cache every time I am calling ?? Shall I maintain a global hashmap to handle that ? something like Map[String,DataFrame] ?? Regards, Rabin Banerjee