Hi My processing pipeline utilizes GenerateSequence transform to query BigQuery periodically. It works, the query is performed exactly with rate defined by GenerateSequence but the watermark is always ~40 minutes behind generated timestamps.
The code looks as follows (Beam: 2.19, Scio: 0.8.4): val sc: ScioContext = ... val startTime = Instant.now() val interval = Duration.standardMinutes(10) val sequence = sc.customInput( GenerateSequence.from(1) .withRate(1, interval) .withTimestampFn(i => startTime.plus(interval.multipliedBy(i)))) .withGlobalWindow(WindowOptions( trigger = Repeatedly.forever(AfterPane.elementCountAtLeast(1)), accumulationMode = AccumulationMode.DISCARDING_FIRED_PANES)) .withTimestamp .map { case (_, timestamp) => timestamp } val finalStream = sequence.flatMap { timestamp => // load data from BQ, the timestamps from the sequence are preserved } I'm looking for the reason for the GenerateSequence watermark lag when the code is running on Dataflow runner. Any clue how to debug the issue further? Marcin