Hello,

I am trying to do something that seems like it should be quite simple but I 
haven’t found an efficient way to do this with Flink and I expect I’m missing 
something obvious here. 

The task is that I would like to process a sequence of events when a certain 
number appear within a keyed event-time window. There will be many keys but 
events within each keyed window will normally be quite sparse. 

My first guess was to use Flink’s sliding windowing functionality. However my 
concern is that events are duplicated for each window. I would like to be 
precise about timing so every event would trigger hundreds of copies of an 
event in hundreds of windows, most which are then discarded because there are 
insufficient events. 

My next guess was to use a process function, and maintain a queue of events as 
the state. When an event occurred I would add it to the queue and then remove 
any events which fell off the end of my window. I thought ListState would help 
here, but that appears to not allow items to be removed.

I then thought about using a ValueState with some queue data structure. However 
my understanding is that changes to a ValueState result in the entire object 
being copied and so would be quite inefficient and best avoided. 

Finally I thought about trying to just maintain a series of timers – 
incrementing on an event and decrementing on its expiry. However I then hit the 
problem of timer coalescing. If an event occurs at the same time as its 
predecessor, the timer will not get set so the counter will get incremented but 
never decremented. 

What I’m doing seems like it would be a common task but none of the options 
look good, so I feel I’m missing something. Could anyone offer some advice on 
how to handle this case?

Thanks in advance. 

Best wishes,
Steven

Reply via email to