Hello all,

I have been running this code against production data, and I'm emitting
counts/sums for a sentinel record id to stdout so I can observe the
behaviour:

https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f

When this code is run, the window duration is 2 minutes, grace period is 20
seconds, and retention time is 20 minutes.

I am endeavouring to use event time as the timestamp basis for this process:
https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180

So, my sentinel debugging output shows a surprising behaviour in that the
outbound counts for the key always sum higher than the inbound count. For
example:

Sample: 2020-04-19T07:31:37.492Z

Inbound
{
    2020-04-19T03:00:00Z=4563,
    2020-04-19T04:00:00Z=5629,
    2020-04-19T05:00:00Z=8489,
    2020-04-19T06:00:00Z=13599
}

Outbound
{
    2020-04-19T03:00:00Z=4717,
    2020-04-19T04:00:00Z=5890,
    2020-04-19T05:00:00Z=8826,
    2020-04-19T06:00:00Z=13951
}

This makes me suspect that either I'm not using the window I thought I was
(e.g., I'm somehow using a sliding window instead of a tumbling window) or
that I have made a rookie error somewhere in my aggregations, or I've just
misunderstood something about this. Does it matter that the window size in
the persistent window store doesn't match the windowing time + grace time
in the windowing clause?

Any pointers gratefully welcome.

Kind regards,

Liam Clarke-Hutchinson

Reply via email to