hi, you can check my stackoverflow question : http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812
On Sat, Nov 19, 2016 at 3:16 AM, Rabin Banerjee < dev.rabin.baner...@gmail.com> wrote: > Hi Yong, > > But every time val tabdf = sqlContext.table(tablename) is called tabdf.rdd > is having a new id which can be checked by calling tabdf.rdd.id . > And, > https://github.com/apache/spark/blob/b6de0c98c70960a97b07615b0b08fb > d8f900fbe7/core/src/main/scala/org/apache/spark/SparkContext.scala#L268 > > Spark is maintaining the Map if [RDD_ID,RDD] , as RDD id is changing , > will spark cache same data again and again ?? > > For example , > > val tabdf = sqlContext.table("employee") > tabdf.cache() > tabdf.someTransformation.someAction > println(tabledf.rdd.id) > val tabdf1 = sqlContext.table("employee") > tabdf1.cache() <= *Will spark again go to disk read and load data into > memory or look into cache ?* > tabdf1.someTransformation.someAction > println(tabledf1.rdd.id) > > Regards, > R Banerjee > > > > > On Fri, Nov 18, 2016 at 9:14 PM, Yong Zhang <java8...@hotmail.com> wrote: > >> That's correct, as long as you don't change the StorageLevel. >> >> >> https://github.com/apache/spark/blob/master/core/src/main/ >> scala/org/apache/spark/rdd/RDD.scala#L166 >> >> >> >> Yong >> >> ------------------------------ >> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> >> *Sent:* Friday, November 18, 2016 10:36 AM >> *To:* user; Mich Talebzadeh; Tathagata Das >> *Subject:* Will spark cache table once even if I call read/cache on the >> same table multiple times >> >> Hi All , >> >> I am working in a project where code is divided into multiple reusable >> module . I am not able to understand spark persist/cache on that context. >> >> My Question is Will spark cache table once even if I call read/cache on >> the same table multiple times ?? >> >> Sample Code :: >> >> TableReader:: >> >> def getTableDF(tablename:String,persist:Boolean = false) : DataFrame >> = { >> val tabdf = sqlContext.table(tablename) >> if(persist) { >> tabdf.cache() >> } >> return tableDF >> } >> >> Now >> Module1:: >> val emp = TableReader.getTable("employee") >> emp.someTransformation.someAction >> >> Module2:: >> val emp = TableReader.getTable("employee") >> emp.someTransformation.someAction >> >> .... >> >> ModuleN:: >> val emp = TableReader.getTable("employee") >> emp.someTransformation.someAction >> >> Will spark cache emp table once , or it will cache every time I am >> calling ?? Shall I maintain a global hashmap to handle that ? something >> like Map[String,DataFrame] ?? >> >> Regards, >> Rabin Banerjee >> >> >> >> > -- *___________________* Quant | Engineer | Boy *___________________* *blog*: http://litaotao.github.io <http://litaotao.github.io?utm_source=spark_mail> *github*: www.github.com/litaotao