Hi,

I would like to know how/where are the serialized closures shipped: are they
sent once per executors or copied to each task? From my understanding they
are copied with each tasks but in the online documentation there is
misleading information.

For example, on the 
http://spark.apache.org/docs/1.5.2/programming-guide.html#understanding-closures-a-nameclosureslinka
<http://spark.apache.org/docs/1.5.2/programming-guide.html#understanding-closures-a-nameclosureslinka>
  
it is specified that the closures are sent to each executor and shared
between tasks: 

"This closure is serialized and sent to each executor. In local mode, there
is only the one executors so everything shares the same closure. In other
modes however, this is not the case and the executors running on seperate
worker nodes each have their own copy of the closure."

If it would be this way, what sense would the broadcast variables have? 
However, I have run a test and it seems that the closures are not shared
between tasks on the same executor but copied with each task.

On the 
http://spark.apache.org/docs/1.5.2/programming-guide.html#shared-variables
<http://spark.apache.org/docs/1.5.2/programming-guide.html#shared-variables>  
it is let to be understood that indeed the closures are copied to each task:

"Broadcast variables allow the programmer to keep a read-only variable
cached on each machine rather than shipping a copy of it with tasks"

Is there something I am missing?

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Closures-sent-once-per-executor-or-copied-with-each-tasks-tp25447.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to