Hi, I would like to know how/where are the serialized closures shipped: are they sent once per executors or copied to each task? From my understanding they are copied with each tasks but in the online documentation there is misleading information.
For example, on the http://spark.apache.org/docs/1.5.2/programming-guide.html#understanding-closures-a-nameclosureslinka <http://spark.apache.org/docs/1.5.2/programming-guide.html#understanding-closures-a-nameclosureslinka> it is specified that the closures are sent to each executor and shared between tasks: "This closure is serialized and sent to each executor. In local mode, there is only the one executors so everything shares the same closure. In other modes however, this is not the case and the executors running on seperate worker nodes each have their own copy of the closure." If it would be this way, what sense would the broadcast variables have? However, I have run a test and it seems that the closures are not shared between tasks on the same executor but copied with each task. On the http://spark.apache.org/docs/1.5.2/programming-guide.html#shared-variables <http://spark.apache.org/docs/1.5.2/programming-guide.html#shared-variables> it is let to be understood that indeed the closures are copied to each task: "Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks" Is there something I am missing? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Closures-sent-once-per-executor-or-copied-with-each-tasks-tp25447.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org