Rather than "updating" the broadcast variable, can't you simply create a new one? When the old one can be gc'ed in your program, it will also get gc'ed from spark's cache (and all executors).
I think this will make your code *slightly* more complicated, as you need to add in another layer of indirection for which broadcast variable to use, but not too bad. Eg., from var myBroadcast = sc.broadcast( ...) (0 to 20).foreach{ iteration => // ... some rdd operations that involve myBroadcast ... myBroadcast.update(...) // wrong! dont' update a broadcast variable } instead do something like: def oneIteration(myRDD: RDD[...], myBroadcastVar: Broadcast[...]): Unit = { ... } var myBroadcast = sc.broadcast(...) (0 to 20).foreach { iteration => oneIteration(myRDD, myBroadcast) var myBroadcast = sc.broadcast(...) // create a NEW broadcast here, with whatever you need to update it } On Sat, May 16, 2015 at 2:01 AM, N B <nb.nos...@gmail.com> wrote: > Thanks Ayan. Can we rebroadcast after updating in the driver? > > Thanks > NB. > > > On Fri, May 15, 2015 at 6:40 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Hi >> >> broadcast variables are shipped for the first time it is accessed in a >> transformation to the executors used by the transformation. It will NOT >> updated subsequently, even if the value has changed. However, a new value >> will be shipped to any new executor comes into play after the value has >> changed. This way, changing value of broadcast variable is not a good idea >> as it can create inconsistency within cluster. From documentatins: >> >> In addition, the object v should not be modified after it is broadcast >> in order to ensure that all nodes get the same value of the broadcast >> variable >> >> >> On Sat, May 16, 2015 at 10:39 AM, N B <nb.nos...@gmail.com> wrote: >> >>> Thanks Ilya. Does one have to call broadcast again once the underlying >>> data is updated in order to get the changes visible on all nodes? >>> >>> Thanks >>> NB >>> >>> >>> On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin <ilgan...@gmail.com> >>> wrote: >>> >>>> The broadcast variable is like a pointer. If the underlying data >>>> changes then the changes will be visible throughout the cluster. >>>> On Fri, May 15, 2015 at 5:18 PM NB <nb.nos...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> Once a broadcast variable is created using sparkContext.broadcast(), >>>>> can it >>>>> ever be updated again? The use case is for something like the >>>>> underlying >>>>> lookup data changing over time. >>>>> >>>>> Thanks >>>>> NB >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcast-variables-can-be-rebroadcast-tp22908.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >