Hi, just answered in your other thread as well... Depending on your requirements, you can look at the updateStateByKey API
From: Nipun Arora Date: Wednesday, June 17, 2015 at 10:51 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Iterative Programming by keeping data across micro-batches in spark-streaming? Hi, Is there anyway in spark streaming to keep data across multiple micro-batches? Like in a HashMap or something? Can anyone make suggestions on how to keep data across iterations where each iteration is an RDD being processed in JavaDStream? This is especially the case when I am trying to update a model or compare two sets of RDD's, or keep a global history of certain events etc which will impact operations in future iterations? I would like to keep some accumulated history to make calculations.. not the entire dataset, but persist certain events which can be used in future JavaDStream RDDs? Thanks Nipun