Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-31 Thread David Gogokhiya
I have a very naive question. I know Jan suggested to use 2 successive fixed overlapping windows with offset as a temporary solution to dedup the events. However, I am wondering whether using a single fixed window of length let's say 1 day followed by a deduplicate function is a good

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-27 Thread Jan Lukavský
> If the user chooses to create a window of 10 years, I'd say it is expected behavior that the state will be kept for as long as this duration. State will be kept, the problem is that each key in the window will carry a cleanup timer, although there might be nothing to clear (there is no

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-27 Thread Maximilian Michels
If the user chooses to create a window of 10 years, I'd say it is expected behavior that the state will be kept for as long as this duration. GlobalWindows are different because they represent the default case where the user does not even use windowing. I think it warrants to be treated

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Jan Lukavský
Window triggering is afaik operation that is specific to GBK. Stateful DoFns can have (as shown in the case of deduplication) timers set for the GC only, triggering has no effect there. And yes, if we have other timers than GC (any user timers), then we have to have GC timer (because timers

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Maximilian Michels
The inefficiency described happens if and only if the following two conditions are met: a) there are many timers per single window (as otherwise they will be negligible) b) there are many keys which actually contain no state (as otherwise the timer would be negligible wrt the state size)

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-26 Thread Jan Lukavský
On 8/25/20 9:27 PM, Maximilian Michels wrote: I agree that this probably solves the described issue in the most straightforward way, but special handling for global window feels weird, as there is really nothing special about global window wrt state cleanup. Why is special handling for the

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-25 Thread Maximilian Michels
I agree that this probably solves the described issue in the most straightforward way, but special handling for global window feels weird, as there is really nothing special about global window wrt state cleanup. Why is special handling for the global window weird? After all, it is a special

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-24 Thread Maximilian Michels
I'd suggest a modified option (2) which does not use a timer to perform the cleanup (as mentioned, this will cause problems with migrating state). Instead, whenever we receive a watermark which closes the global window, we enumerate all keys and cleanup the associated state. This is the

Re: [External] Re: Memory Issue When Running Beam On Flink

2020-08-15 Thread Maximilian Michels
Awesome! Thanks a lot for the memory profile. Couple remarks: a) I can see that there are about 378k keys and each of them sets a timer. b) Based on the settings for DeduplicatePerKey you posted, you will keep track of all keys of the last 30 minutes. Unless you have much fewer keys, the