[PySpark] - running processes

Sidney Feiner Mon, 03 Jul 2017 04:53:31 -0700

In my Spark Streaming application, I have the need to build a graph from a file 
and initializing that graph takes between 5 and 10 seconds.


So I tried initializing it once per executor so it'll be initialized only once.

After running the application, I've noticed that it's initiated much more than 
once per executor, every time with a different process id (every process has 
it's own logger).

Doesn't every executor have it's own JVM and it's own process? Or is that only 
relevant when I develop in JVM languages like Scala/Java? Do executors in 
PySpark spawn new processes for new tasks?

And if they do, how can I make sure that my graph object will really only be 
initiated once?
Thanks :)


Sidney Feiner / SW Developer
M: +972.528197720 / Skype: sidney.feiner.startapp

[emailsignature]

[PySpark] - running processes

Reply via email to