Re: Stateful Stream Processing

amir bahmanyari Mon, 18 Jul 2016 10:23:51 -0700

In the meanwhile, the poor man's best choice , like me, is using an in-memory 
db to maintain "state"...Am using Open Source Redis Redis - Wikipedia, the free 
encyclopedia & its "working". But, network back/forth makes my Flink Beam app 
run slower than keeping state in runtime available only Java collection 
objects...
  
|  
|   
|   
|   |    |


   |

  |
|  
|   |  
Redis - Wikipedia, the free encyclopedia
   |   |

  |

  |

 
FYI.Cheers

      From: David Desberg <[email protected]>
 To: [email protected] 
 Sent: Friday, July 15, 2016 6:59 PM
 Subject: Re: Stateful Stream Processing
   
Kenn,
Gotcha. When do you think that revision will be done, out of curiosity? Also, 
the EWMA example is a bit simplified; there are other forecasting algorithms we 
plan to implement which require more complex state management. 
As of now, our plan is to create a sort of "scan" function which can be applied 
to a pipeline, with semantics as described in my previous email, and implement 
it for the Flink pipeline translator/runner. Any thoughts on this/is a similar 
construct at all part of the Beam roadmap? Trying to create something that 
would be useful for the community at-large, not just us. 
David
On Fri, Jul 15, 2016 at 3:51 PM, Kenneth Knowles <[email protected]> wrote:

Hi David,I'm responsible for that feature; it was under design review at the 
advent of Beam (hence the low issue number). I'm working on prepping a revision 
that adds context and generalizes to Beam, rather than just Dataflow.For your 
use case, it isn't as simple as stateful decay if you care about event time, 
since windowing does not actually reorder your inputs. With a watermark-based 
trigger you are most likely close enough, though even then if the watermark 
passes the end of multiple windows in one update they are all permitted to be 
output, in any order. And transport between transforms is not required to be 
order preserving, though obviously in many backends ends it is, especially to 
support stateful pipelines. We want to talk about ordering explicitly in Beam, 
or else presume a lack of order.The good news is you can calculate the EWMA 
directly using sliding windows that are large enough so their tail has 
negligible contribution. This should work well and as a bonus you'll get the 
full time series.Apologies if I've misunderstood what you are hoping to 
accomplish.KennOn Jul 15, 2016 3:06 PM, "David Desberg" 
<[email protected]> wrote:
>
> Hi all,
>
> I'm working on a simple Beam app which should compute an exponential weighted 
> moving average on a stream of data, by key. The data is windowed at a fixed 
> interval, the count of the elements is taken per-key, and then this count 
> should be utilized to update the moving average. This requires maintaining 
> state (local to each JVM instance/per-key) in the form of the result of the 
> previous computation. Either a key-value based state store (as is available 
> in Flink) or an implementation of scan semantics (where the result of the 
> previous computation is passed as the initial value to the next invocation) 
> would work; however, there does not seem to be a way to achieve either of 
> these with the Beam API as it currently stands. 
>
> I noticed a related JIRA issue 
> (https://issues.apache.org/jira/browse/BEAM-25) but it seems no progress has 
> been made. Is there vision/roadmap for this API? I would be happy to 
> contribute to the project by beginning an implementation and would love to 
> collaborate with anyone already working toward this goal.
>
> Thanks!
> David Desberg

Re: Stateful Stream Processing

Reply via email to