updateStateByKey for window batching

Dávid Szakállas Mon, 22 Aug 2016 03:50:18 -0700

Hi!

I’m curious about the fault-tolerance properties of stateful streaming 
operations. I am specifically interested about updateStateByKey.
What happens if a node fails during processing? Is the state recoverable?


Our use case is the following: we have messages arriving from a message queue 
about updating a resource specified in the message. When such update request 
arrives, we wait a specific amount of times and if in that window another 
update message arrives pointing in the same resource, we batch these, and 
update the after the time elapsed since the first in this window and update the 
resource. We thought about using updateStateByKey with key as the resource 
identifier.

It is important to guarantee exactly once processing for the messages so every 
update should happen, and no more than once.

Is it a good way to go?

Cheers,

--
David Szakallas | Software Engineer, RisingStack
Monitoring with Trace: http://trace.risingstack.com 
<http://trace.risingstack.com/>
http://risingstack.com <http://risingstack.com/> | http://blog.risingstack.com 
<http://blog.risingstack.com/>
Twitter: @szdavid92

signature.asc
Description: Message signed with OpenPGP using GPGMail

updateStateByKey for window batching

Reply via email to