In my Spark Streaming application, I have the need to build a graph from a file and initializing that graph takes between 5 and 10 seconds.
So I tried initializing it once per executor so it'll be initialized only once. After running the application, I've noticed that it's initiated much more than once per executor, every time with a different process id (every process has it's own logger). Doesn't every executor have it's own JVM and it's own process? Or is that only relevant when I develop in JVM languages like Scala/Java? Do executors in PySpark spawn new processes for new tasks? And if they do, how can I make sure that my graph object will really only be initiated once? Thanks :) Sidney Feiner / SW Developer M: +972.528197720 / Skype: sidney.feiner.startapp [emailsignature]