Hey all,
Just wanted to follow up. I have not had much time to work on this, been
trying to figure out why leaving the stream on for a few days causes some
of the stream apps to go into this death spiral of connecting/disconnecting.
The store growing unbounded stopped me in my tracks and I
Just had a thought:
If you implement the Windowed/Tuple serde to store the timestamp(s)
before the actual record key then you can simply periodically do a
ranged query on each of the state stores to find and delete all data
older than ... (using punctuate() inside a Processor).
Any
Hi Matthias,
Yes, the ever growing stores were my concern too. That was the intention
behind my TODO note in the first reply just didn't want to touch on this
until I've dug deeper into it.
I understand compaction+retention policy on the backing changelog topics
takes care of cleaning up on
Hi Garrett,
I'm glad this helped.
You're absolutely right, only the "oneMinuteWindowed" KTable has a
Windowed key - apologies again for getting it wrong the first time.
I admit I used window().end() arbitrarily. If window().start() matches
your semantics better, use that. Further on that
This seems to be a question that might affect many users, and it might
we worth to document it somewhere as a recommended pattern.
I was thinking the same thing :)
How about a page on the wiki listing useful patterns with subpages for
each patten in detail? (like for KIPs)
Thanks,
Michał
Thinking about this once more (and also having a fresh memory of another
thread about KTables), I am wondering if this approach needs some extra
tuning:
As the result of the first window aggregation produces an output stream
with unbounded key space, the following (non-windowed) KTables would
Michael,
This is slick! I am still writing unit tests to verify it. My code
looks something like:
KTable oneMinuteWindowed =
srcStream// my val object isnt really called that, just wanted to show
a sample set of calculations the value can do!
Michal,
that's an interesting idea. In an ideal world, Kafka Streams should have
an optimizer that is able to to this automatically under the hood. Too
bad we are not there yet.
@Garret: did you try this out?
This seems to be a question that might affect many users, and it might
we worth to
Apologies,
In the code snippet of course only oneMinuteWindowed KTable will have a
Windowed key (KTable), all others would be just
KTable, Value>.
Michał
On 07/05/17 16:09, Michal Borowiecki wrote:
Hi Garrett,
I've encountered a similar challenge in a
Hi Garrett,
I've encountered a similar challenge in a project I'm working on (it's
still work in progress, so please take my suggestions with a grain of salt).
Yes, I believe KTable.groupBy lets you accomplish what you are aiming
for with something like the following (same snippet attached
Lets say I want to sum values over increasing window sizes of 1,5,15,60
minutes. Right now I have them running in parallel, meaning if I am
producing 1k/sec records I am consuming 4k/sec to feed each calculation.
In reality I am calculating far more than sum, and in this pattern I'm
looking at
11 matches
Mail list logo