The result of updatestatebykey is a dstream that emits the entire state every batch - as an RDD - nothing special about it.
It easy to join / cogroup with another RDD if you have the correct keys in both. You could load this one when the job starts and/or have it update with updatestatebykey as well, based on streaming updates from cassandra. Sent from my iPhone > On 22 Oct 2015, at 12:54, Arttii <a.topch...@reply.de> wrote: > > Hi, > > So I am working on a usecase, where Clients are walking in and out of > geofences and sendingmessages based on that. > I currently have some in Memory Broadcast vars to do certain lookups for > client and geofence info, some of this is also coming from Cassandra. > My current quandry is that I need to support the case where a user comes in > and out of geofence and also track how many messages have already been sent > and do some logic based on that. > > My stream is basically a bunch of jsons > { > member:"" > beacon > state:"exit","enter" > } > > > This information is invalidated at certain timesteps say messages a day and > geofence every few minutes. Frist I thought if broadcast vars are good for > this, but this gets updated a bunch so i do not think I can peridically > rebroadcast these from the driver. > > So I was thinking this might be a perfect case for UpdateStateByKey as I can > kinda track what is going > and also track the time inside the values and return Nones to "pop" things. > > Currently I cannot wrap my head around on how to use this stream in > conjuction with some other info that is coming in "Dstreams" "Rdds". All the > example for UpdateStatebyKey are basically doing something to a stream > updateStateBykey and then foreaching over it and persisting in a store. I > dont think writing and reading from cassandra on every batch to get this > info is a good idea, because I might get stale info. > > Is this a valid case or am I missing the point and usecase of this function? > > Thanks, > Artyom > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-StreamingStatefull-information-tp25160.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org