Hello all, I have been running this code against production data, and I'm emitting counts/sums for a sentinel record id to stdout so I can observe the behaviour:
https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f When this code is run, the window duration is 2 minutes, grace period is 20 seconds, and retention time is 20 minutes. I am endeavouring to use event time as the timestamp basis for this process: https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180 So, my sentinel debugging output shows a surprising behaviour in that the outbound counts for the key always sum higher than the inbound count. For example: Sample: 2020-04-19T07:31:37.492Z Inbound { 2020-04-19T03:00:00Z=4563, 2020-04-19T04:00:00Z=5629, 2020-04-19T05:00:00Z=8489, 2020-04-19T06:00:00Z=13599 } Outbound { 2020-04-19T03:00:00Z=4717, 2020-04-19T04:00:00Z=5890, 2020-04-19T05:00:00Z=8826, 2020-04-19T06:00:00Z=13951 } This makes me suspect that either I'm not using the window I thought I was (e.g., I'm somehow using a sliding window instead of a tumbling window) or that I have made a rookie error somewhere in my aggregations, or I've just misunderstood something about this. Does it matter that the window size in the persistent window store doesn't match the windowing time + grace time in the windowing clause? Any pointers gratefully welcome. Kind regards, Liam Clarke-Hutchinson
