Hi Aljoscha,

thank you for your reply.

On 2021/01/08 15:44 Aljoscha Krettek wrote:
>the basic problem for your use case is that window boundaries are
>inclusive for the start timestamp and exclusive for the end timestamp.

That's true. What further complicates matters is that the last value of
the window (which should also be the first value of the next window)
might not have exactly the end timestamp of the one hour window but
could be even days in the future if the sensor is powered off, for
example, over a weekend.

>It's setup like this to ensure that consecutive tumbling windows don't
>overlap. This is only a function of how our `WindowAssigner` works, so
>it could be done differently in a different system.

I have tried to learn a little bit about the `WindowAssigner` system.
I think that if I could assign an element to two windows, I could
process the cumulative counter correctly. The issue, that I had with a
`WindowAssigner`, was that I didn't seem to have a way to discover what
other windows exist. I would've wanted to assign the element to a window
based on the element's timestamp and to the previous window. Here is an
illustration of what I mean:

W1 (16:00 - 17:00)              W2 (09:00 - 10:00)
+-----------+                   +-----------+
|   a       |  ...sensor off... |       b   |
+-----------+                   +-----------+

I would like to assign b to windows W2 and W1 so that I can calculate
runtime during W1 as b - a. The value `a` has been recorded when the
sensor started. Because there are no other values, the sensor was
shut down within 15 minutes. The value `b` has been recorded the
following day when the sensor was started the next time. By calculating
b - a I can find out for how many seconds the sensor was running during
window 1 (result would be between 0 and 15 minutes or 0 and 900
seconds).

>Have you tried using a sliding window where the `slide` is `size - 1ms`?
>With this, you would ensure that elements that fall exactly on the
>boundary, i.e. your hourly sensor updates would end up in both of the
>consecutive windows. It seems a bit unorthodox but could work in your
>case.

I've only tried a sliding window with a size of two hours and a `slide`
of one hour. My idea was to keep the window start and end aligned with
full hours. If I were to use a `slide` of `size - 1ms` wouldn't that
cause a widening misalignment with full hours?

My issue with a sliding window is that I can't know how far in the
future the value `b` is. Therefore I can't define a window size that is
long enough to include both values `a` and `b`. Here is an illustration
of that situation:

W1 (16:00 - 18:00)              W2 (09:00 - 10:00)
+-----------------------+       +------------
|   a       |           |  ...  |       b   | ...
+-----------------------+       +------------

Here the window size is two hours and the `slide` is one hour. I would
calculate the runtime for the first half of the window (16:00 - 17:00).
Here the problem is that the value `b` is so far in the future that
it isn't assigned to the window W1 and I can't calculate the runtime for
the hour 16:00 - 17:00.

Best regards,
Larry

Reply via email to