Lets say I want to sum values over increasing window sizes of 1,5,15,60
minutes.  Right now I have them running in parallel, meaning if I am
producing 1k/sec records I am consuming 4k/sec to feed each calculation.
In reality I am calculating far more than sum, and in this pattern I'm
looking at something like (producing rate)*(calculations)*(windows) for a
consumption rate.

 So I had the idea, could I feed the 1 minute window into the 5 minute, and
5 into 15, and 15 into 60.  Theoretically I would consume a fraction of the
records, not have to scale as huge and be back to something like (producing
rate)*(calculations)+(updates).

  Thinking this is an awesome idea I went to try and implement it and got
twisted around.  These are windowed grouping operations that produce
KTables, which means instead of a raw stream I have an update stream.  To
me this implies that downstream must be aware of this and consume stateful
information, knowing that each record is an update and not an in addition
to.  Does the high level api handle that construct and let me do that?  For
a simple sum it would have to hold each of the latest values for say the 5
1 minute sum's in a given window, to perform the 5 minute sum.  Reading the
docs which are awesome, I cannot determine if the KTable.groupby() would
work over a window, and would reduce or aggregate thus do what I need?

Any ideas?

Reply via email to