That depends on if your using event, processing or ingestion time. My understanding is that if you play a record through that is T-6, the only way that TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5)) would actually process that record in your window is if your using processing time. Otherwise the record is skipped and no data is generated/calculated for that operation. So depending on what your doing you would not increase any more memory usage than when consuming from real-time.
On Wed, May 3, 2017 at 3:37 AM, João Peixoto <joao.harti...@gmail.com> wrote: > The base question I'm trying to answer is "how much memory does my instance > need". > > Considering a use case where I want to keep a rolling average on a tumbling > window of 1 minute size allowing for late arrivals up to 5 minutes (lower > bound) we would have something like this: > > TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until( > TimeUnit.MINUTES.toMillis(5)) > > The aggregate key size is 8 bytes, the average value is 8 bytes and for > de-duplication purposes we need to keep track of which messages we saw > already, so a list of keys averaging 10 entries. > > If I understand correctly this means that each window will be on average 96 > bytes in size. > > A single topic generates 100 messages/minute, which aggregate into 10 > independent windows. > > On any given point in time the windowed aggregates require 960 bytes of > memory at least. > > Here's the confusing part. Lets say I found an issue with my averaging > operation and I want to reprocess the last 10 hours worth of messages. > > 1. Windows will be regenerated, since most likely they were cleaned up > already > 2. The retention policy now has different semantics? If I had a late > arrival of 6 minutes, all of the sudden the reprocessing will account for > it right? Since the window is now active due to recreation (Assuming my app > is capable of processing all messages under 5 minutes) > 3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes, > so my memory requirement is now 576,000 bytes? This is orders of magnitude > bigger. > > I hope this gets my doubts across clearly, feel free to ask more details. > And thanks in advance >