Re: Iterative Programming by keeping data across micro-batches in spark-streaming?

Silvio Fiorito Wed, 17 Jun 2015 19:53:07 -0700

Hi, just answered in your other thread as well...

Depending on your requirements, you can look at the updateStateByKey API

From: Nipun Arora
Date: Wednesday, June 17, 2015 at 10:51 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Iterative Programming by keeping data across micro-batches in 
spark-streaming?

Hi,

Is there anyway in spark streaming to keep data across multiple micro-batches? 
Like in a HashMap or something?
Can anyone make suggestions on how to keep data across iterations where each 
iteration is an RDD being processed in JavaDStream?

This is especially the case when I am trying to update a model or compare two 
sets of RDD's, or keep a global history of certain events etc which will impact 
operations in future iterations?
I would like to keep some accumulated history to make calculations.. not the 
entire dataset, but persist certain events which can be used in future 
JavaDStream RDDs?

Thanks
Nipun

Re: Iterative Programming by keeping data across micro-batches in spark-streaming?

Reply via email to