I don't think this has anything to do with transferring anything from the driver, or per task. I'm talking about a singleton object in the JVM that loads whatever you want from wherever you want and holds it in memory once per JVM. That is, I do not think you have to use broadcast, or even any Spark mechanism.
On Mon, Jan 19, 2015 at 2:35 AM, Ji ZHANG <zhangj...@gmail.com> wrote: > Hi Sean, > > Thanks for your advice, a normal 'val' will suffice. But will it be > serialized and transferred every batch and every partition? That's why > broadcast exists, right? > > For now I'm going to use 'val', but I'm still looking for a broadcast-way > solution. > > > On Sun, Jan 18, 2015 at 5:36 PM, Sean Owen <so...@cloudera.com> wrote: >> >> I think that this problem is not Spark-specific since you are simply side >> loading some data into memory. Therefore you do not need an answer that uses >> Spark. >> >> Simply load the data and then poll for an update each time it is accessed? >> Or some reasonable interval? This is just something you write in Java/Scala. >> >> On Jan 17, 2015 2:06 PM, "Ji ZHANG" <zhangj...@gmail.com> wrote: >>> >>> Hi, >>> >>> I want to join a DStream with some other dataset, e.g. join a click >>> stream with a spam ip list. I can think of two possible solutions, one >>> is use broadcast variable, and the other is use transform operation as >>> is described in the manual. >>> >>> But the problem is the spam ip list will be updated outside of the >>> spark streaming program, so how can it be noticed to reload the list? >>> >>> For broadcast variables, they are immutable. >>> >>> For transform operation, is it costly to reload the RDD on every >>> batch? If it is, and I use RDD.persist(), does it mean I need to >>> launch a thread to regularly unpersist it so that it can get the >>> updates? >>> >>> Any ideas will be appreciated. Thanks. >>> >>> -- >>> Jerry >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> > > > > -- > Jerry --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org