Whenever spark read the data from it will have it in executor memory until and unless there is no room for new data read or processed. This is the beauty of spark.
On Tue, Apr 3, 2018 at 12:42 AM snjv <snjv.workm...@gmail.com> wrote: > Hi, > > When we execute the same operation twice, spark takes less time ( ~40%) > than > the first. > Our operation is like this: > Read 150M rows ( spread in multiple parquet files) into DF > Read 10M rows ( spread in multiple parquet files) into other DF. > Do an intersect operation. > > Size of 150M row file: 587MB > size of 10M file: 50M > > If first execution takes around 20 sec the next one will take just 10-12 > sec. > Any specific reason for this? Is any optimization is there that we can > utilize during the first operation? > > Regards > Sanjeev > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/