If you don't need the counts in betweem the DB writes, you could simply use a 5 
min window for the updateStateByKey and use foreachRdd on the resulting DStream.

Even simpler, you could use reduceByKeyAndWindow directly.

Lastly, you could keep a variable on the driver and check if 5 minutes have 
passed
in foreachRdd on the original DStream, even if the batch duration is shorter.

Also, remember to cleanup the state in your updateStateByKey function or it 
will grow unbounded. I still believe one of the builtin ByKey functions are a 
simpler strategy.

hope this helps.

-adrian

Sent from my iPhone

> On 16 Sep 2015, at 22:33, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote:
> 
> Hello.
> 
> I have a streaming job that is processing data.  I process a stream of 
> events, taking actions when I see anomalous events.  I also keep a count 
> events observed using updateStateByKey to maintain a map of type to count.  I 
> would like to periodically (every 5 minutes) write the results of my counts 
> to a database.  Is there a built in mechanism or established pattern to 
> execute periodic jobs in spark streaming? 
> 
> Regards,
> 
> Bryan Jeffrey

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to