Hi


I'm experimenting with Dataflow SQL streaming extension and I observed that
the event_timestamp field in the payload is ignored.



I would like to calculate the average value of the values reported by the
sensor every 5 seconds.



SELECT CURRENT_TIMESTAMP() AS created_at, * FROM

    (SELECT

        s1.window_start AS window_start,

        s1.window_end AS window_end,

        MIN(event_timestamp) AS event_timestamp_min,

        MAX(event_timestamp) AS event_timestamp_max,

        AVG(s1.sensor_value) AS sensor_value_avg,

    FROM TUMBLE(

        (SELECT * FROM
pubsub.topic.`sc-9366-nga-dev`.`marcin-atm22-signal-1`),

        DESCRIPTOR(event_timestamp),

        "INTERVAL 5 SECOND"

        ) as s1

    GROUP BY window_start, window_end)



For testing purposes sensor data is artificially generated, and
event_timestamp is always 30 seconds behind current time.



current timestamp: 2022-08-05T15:00:24+00:00

{'event_timestamp': '2022-08-05T14:59:54+00:00', 'sensor_value':
0.4083962116009032}



But I get the following result at 15:00:28 (the latest row stored in BQ) :

[{

  "created_at": "2022-08-05T15:00:20.170Z",

  "window_start": "2022-08-05T15:00:05Z",

  "window_end": "2022-08-05T15:00:10Z",

  "event_timestamp_min": "2022-08-05T15:00:05.019Z",

  "event_timestamp_max": "2022-08-05T15:00:09.035Z",

  "sensor_value_avg": "0.1612730883"

}]



Why is there a record created at 15:00:20 with a window 15:00:05-15:00:10
if the input event_time is always delayed by 30 seconds? At 15:00:20 the
latest emitted sensor event_timestamp is ~ 14:59:50.



Moreover the watermark lag reported by dataflow is always 10-20 seconds,
even if the event_timestamp reported by the sensor is far behind the
wallclock.

Any ideas?

Regards,

Marcin

Reply via email to