Rather than "updating" the broadcast variable, can't you simply create a
new one?  When the old one can be gc'ed in your program, it will also get
gc'ed from spark's cache (and all executors).

I think this will make your code *slightly* more complicated, as you need
to add in another layer of indirection for which broadcast variable to use,
but not too bad.  Eg., from

var myBroadcast = sc.broadcast( ...)
(0 to 20).foreach{ iteration =>
  //  ... some rdd operations that involve myBroadcast ...
  myBroadcast.update(...) // wrong! dont' update a broadcast variable
}

instead do something like:

def oneIteration(myRDD: RDD[...], myBroadcastVar: Broadcast[...]): Unit = {
 ...
}

var myBroadcast = sc.broadcast(...)
(0 to 20).foreach { iteration =>
  oneIteration(myRDD, myBroadcast)
  var myBroadcast = sc.broadcast(...) // create a NEW broadcast here, with
whatever you need to update it
}

On Sat, May 16, 2015 at 2:01 AM, N B <nb.nos...@gmail.com> wrote:

> Thanks Ayan. Can we rebroadcast after updating in the driver?
>
> Thanks
> NB.
>
>
> On Fri, May 15, 2015 at 6:40 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> broadcast variables are shipped for the first time it is accessed in a
>> transformation to the executors used by the transformation. It will NOT
>> updated subsequently, even if the value has changed. However, a new value
>> will be shipped to any new executor comes into play after the value has
>> changed. This way, changing value of broadcast variable is not a good idea
>> as it can create inconsistency within cluster. From documentatins:
>>
>>  In addition, the object v should not be modified after it is broadcast
>> in order to ensure that all nodes get the same value of the broadcast
>> variable
>>
>>
>> On Sat, May 16, 2015 at 10:39 AM, N B <nb.nos...@gmail.com> wrote:
>>
>>> Thanks Ilya. Does one have to call broadcast again once the underlying
>>> data is updated in order to get the changes visible on all nodes?
>>>
>>> Thanks
>>> NB
>>>
>>>
>>> On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin <ilgan...@gmail.com>
>>> wrote:
>>>
>>>> The broadcast variable is like a pointer. If the underlying data
>>>> changes then the changes will be visible throughout the cluster.
>>>> On Fri, May 15, 2015 at 5:18 PM NB <nb.nos...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Once a broadcast variable is created using sparkContext.broadcast(),
>>>>> can it
>>>>> ever be updated again? The use case is for something like the
>>>>> underlying
>>>>> lookup data changing over time.
>>>>>
>>>>> Thanks
>>>>> NB
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcast-variables-can-be-rebroadcast-tp22908.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Reply via email to