Depends on your use-case however broadcasting <https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables> could be a better option.
On Thu, Sep 21, 2017 at 2:03 PM, Chackravarthy Esakkimuthu < chaku.mi...@gmail.com> wrote: > Hi, > > I want to know how to pass sparkSession from driver to executor. > > I have a spark program (batch job) which does following, > > ################# > > val spark = SparkSession.builder().appName("SampleJob").config(" > spark.master", "local") .getOrCreate() > > val df = this is dataframe which has list of file names (hdfs) > > df.foreach { fileName => > > *spark.read.json(fileName)* > > ...... some logic here.... > } > > ################# > > > *spark.read.json(fileName) --- this fails as it runs in executor. When I > put it outside foreach, i.e. in driver, it works.* > > As I am trying to use spark (sparkSession) in executor which is not > visible outside driver. But I want to read hdfs files inside foreach, how > do I do it. > > Can someone help how to do this. > > Thanks, > Chackra >