Depends on your use-case however broadcasting
<https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#broadcast-variables>
could be a better option.

On Thu, Sep 21, 2017 at 2:03 PM, Chackravarthy Esakkimuthu <
chaku.mi...@gmail.com> wrote:

> Hi,
>
> I want to know how to pass sparkSession from driver to executor.
>
> I have a spark program (batch job) which does following,
>
> #################
>
> val spark = SparkSession.builder().appName("SampleJob").config("
> spark.master", "local") .getOrCreate()
>
> val df = this is dataframe which has list of file names (hdfs)
>
> df.foreach { fileName =>
>
>       *spark.read.json(fileName)*
>
>       ...... some logic here....
> }
>
> #################
>
>
> *spark.read.json(fileName) --- this fails as it runs in executor. When I
> put it outside foreach, i.e. in driver, it works.*
>
> As I am trying to use spark (sparkSession) in executor which is not
> visible outside driver. But I want to read hdfs files inside foreach, how
> do I do it.
>
> Can someone help how to do this.
>
> Thanks,
> Chackra
>

Reply via email to