Hi Silvio, Thanks for your response. I should clarify. I would like to do updates on a structure iteratively. I am not sure if updateStateByKey meets my criteria.
In the current situation, I can run some map reduce tasks and generate a JavaPairDStream<Key,Value>, after this my algorithm is necessarily sequential ... i.e. I have sorted the data using the timestamp(within the messages), and I would like to iterate over it, and maintain a state where I can update a model. I tried using foreach/foreachRDD, and collect to do this, but I can't seem to propagate values across microbatches/RDD's. Any suggestions? Thanks Nipun On Wed, Jun 17, 2015 at 10:52 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Hi, just answered in your other thread as well... > > Depending on your requirements, you can look at the updateStateByKey API > > From: Nipun Arora > Date: Wednesday, June 17, 2015 at 10:51 PM > To: "user@spark.apache.org" > Subject: Iterative Programming by keeping data across micro-batches in > spark-streaming? > > Hi, > > Is there anyway in spark streaming to keep data across multiple > micro-batches? Like in a HashMap or something? > Can anyone make suggestions on how to keep data across iterations where > each iteration is an RDD being processed in JavaDStream? > > This is especially the case when I am trying to update a model or compare > two sets of RDD's, or keep a global history of certain events etc which > will impact operations in future iterations? > I would like to keep some accumulated history to make calculations.. not > the entire dataset, but persist certain events which can be used in future > JavaDStream RDDs? > > Thanks > Nipun >