Hi All ,

  I am working in a project where code is divided into multiple reusable
module . I am not able to understand spark persist/cache on that context.

My Question is Will spark cache table once even if I call read/cache on the
same table multiple times ??

 Sample Code ::

  TableReader::

   def getTableDF(tablename:String,persist:Boolean = false) : DataFrame = {
         val tabdf = sqlContext.table(tablename)
         if(persist) {
             tabdf.cache()
            }
      return tableDF
}

 Now
Module1::
 val emp = TableReader.getTable("employee")
 emp.someTransformation.someAction

Module2::
 val emp = TableReader.getTable("employee")
 emp.someTransformation.someAction

....

ModuleN::
 val emp = TableReader.getTable("employee")
 emp.someTransformation.someAction

Will spark cache emp table once , or it will cache every time I am calling
?? Shall I maintain a global hashmap to handle that ? something like
Map[String,DataFrame] ??

 Regards,
Rabin Banerjee

Reply via email to