Re: How to chain increasing window operations one after another

2017-05-12 Thread Garrett Barton
Hey all, Just wanted to follow up. I have not had much time to work on this, been trying to figure out why leaving the stream on for a few days causes some of the stream apps to go into this death spiral of connecting/disconnecting. The store growing unbounded stopped me in my tracks and I

Re: How to chain increasing window operations one after another

2017-05-09 Thread Michal Borowiecki
Just had a thought: If you implement the Windowed/Tuple serde to store the timestamp(s) before the actual record key then you can simply periodically do a ranged query on each of the state stores to find and delete all data older than ... (using punctuate() inside a Processor). Any

Re: How to chain increasing window operations one after another

2017-05-09 Thread Michal Borowiecki
Hi Matthias, Yes, the ever growing stores were my concern too. That was the intention behind my TODO note in the first reply just didn't want to touch on this until I've dug deeper into it. I understand compaction+retention policy on the backing changelog topics takes care of cleaning up on

Re: How to chain increasing window operations one after another

2017-05-09 Thread Michal Borowiecki
Hi Garrett, I'm glad this helped. You're absolutely right, only the "oneMinuteWindowed" KTable has a Windowed key - apologies again for getting it wrong the first time. I admit I used window().end() arbitrarily. If window().start() matches your semantics better, use that. Further on that

Re: How to chain increasing window operations one after another

2017-05-09 Thread Michal Borowiecki
This seems to be a question that might affect many users, and it might we worth to document it somewhere as a recommended pattern. I was thinking the same thing :) How about a page on the wiki listing useful patterns with subpages for each patten in detail? (like for KIPs) Thanks, Michał

Re: How to chain increasing window operations one after another

2017-05-08 Thread Matthias J. Sax
Thinking about this once more (and also having a fresh memory of another thread about KTables), I am wondering if this approach needs some extra tuning: As the result of the first window aggregation produces an output stream with unbounded key space, the following (non-windowed) KTables would

Re: How to chain increasing window operations one after another

2017-05-08 Thread Garrett Barton
Michael, This is slick! I am still writing unit tests to verify it. My code looks something like: KTable oneMinuteWindowed = srcStream// my val object isnt really called that, just wanted to show a sample set of calculations the value can do!

Re: How to chain increasing window operations one after another

2017-05-08 Thread Matthias J. Sax
Michal, that's an interesting idea. In an ideal world, Kafka Streams should have an optimizer that is able to to this automatically under the hood. Too bad we are not there yet. @Garret: did you try this out? This seems to be a question that might affect many users, and it might we worth to

Re: How to chain increasing window operations one after another

2017-05-08 Thread Michal Borowiecki
Apologies, In the code snippet of course only oneMinuteWindowed KTable will have a Windowed key (KTable), all others would be just KTable, Value>. Michał On 07/05/17 16:09, Michal Borowiecki wrote: Hi Garrett, I've encountered a similar challenge in a

Re: How to chain increasing window operations one after another

2017-05-07 Thread Michal Borowiecki
Hi Garrett, I've encountered a similar challenge in a project I'm working on (it's still work in progress, so please take my suggestions with a grain of salt). Yes, I believe KTable.groupBy lets you accomplish what you are aiming for with something like the following (same snippet attached

How to chain increasing window operations one after another

2017-05-02 Thread Garrett Barton
Lets say I want to sum values over increasing window sizes of 1,5,15,60 minutes. Right now I have them running in parallel, meaning if I am producing 1k/sec records I am consuming 4k/sec to feed each calculation. In reality I am calculating far more than sum, and in this pattern I'm looking at