Hi! I’m curious about the fault-tolerance properties of stateful streaming operations. I am specifically interested about updateStateByKey. What happens if a node fails during processing? Is the state recoverable?
Our use case is the following: we have messages arriving from a message queue about updating a resource specified in the message. When such update request arrives, we wait a specific amount of times and if in that window another update message arrives pointing in the same resource, we batch these, and update the after the time elapsed since the first in this window and update the resource. We thought about using updateStateByKey with key as the resource identifier. It is important to guarantee exactly once processing for the messages so every update should happen, and no more than once. Is it a good way to go? Cheers, -- David Szakallas | Software Engineer, RisingStack Monitoring with Trace: http://trace.risingstack.com <http://trace.risingstack.com/> http://risingstack.com <http://risingstack.com/> | http://blog.risingstack.com <http://blog.risingstack.com/> Twitter: @szdavid92
signature.asc
Description: Message signed with OpenPGP using GPGMail