Re: Loading objects only once

2017-09-28 Thread Eike von Seggern
Hello, maybe broadcast can help you here. [1] You can load the model once on the driver and then broadcast it to the workers with `bc_model = sc.broadcast(model)`? You can access the model in the map function with `bc_model.value()`. Best Eike [1] https://spark.apache.org/docs/latest/api/pytho

RE: Loading objects only once

2017-09-28 Thread JG Perrin
Maybe load the model on each executor’s disk and load it from there? Depending on how you use the data/model, using something like Livy and sharing the same connection may help? From: Naveen Swamy [mailto:mnnav...@gmail.com] Sent: Wednesday, September 27, 2017 9:08 PM To: user@spark.apache.org S

Re: Loading objects only once

2017-09-28 Thread Vadim Semenov
as an alternative ``` spark-submit --files ``` the files will be put on each executor in the working directory, so you can then load it alongside your `map` function Behind the scene it uses `SparkContext.addFile` method that you can use too https://github.com/apache/spark/blob/master/core/src/m

Re: Loading objects only once

2017-09-28 Thread Vadim Semenov
Something like this: ``` object Model { @transient lazy val modelObject = new ModelLoader("model-filename") def get() = modelObject } object SparkJob { def main(args: Array[String]) = { sc.addFile("s3://bucket/path/model-filename") sc.parallelize(…).map(test => { Mode