Depending on your requirements, you can look at the updateStateByKey API

From: Nipun Arora
Date: Wednesday, June 17, 2015 at 10:48 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: <no subject>

Hi,

Is there anyway in spark streaming to keep data across multiple micro-batches? 
Like in a HashMap or something?
Can anyone make suggestions on how to keep data across iterations where each 
iteration is an RDD being processed in JavaDStream?

This is especially the case when I am trying to update a model or compare two 
sets of RDD's, or keep a global history of certain events etc which will impact 
operations in future iterations?
I would like to keep some accumulated history to make calculations.. not the 
entire dataset, but persist certain events which can be used in future 
JavaDStream RDDs?

Thanks
Nipun

Reply via email to