I believe that by default windows will only trigger one time [1]. This has
definitely caught me by surprise before.
I think that default strategy might fine for a batch pipeline,
but typically does not for streaming (which I assume you’re using because
you mentioned Flink).
I believe you’ll want to add a non-default triggering mechanism to the
window strategy that you mentioned. I would recommend reading through the
triggering docs[2] for background. The Repeatedly.forever[3] function may
work for your use case. Something like:
Window.into(FixedWindows.of(Duration.standardMinutes(10)))
.triggering(Repeatedly.forever(AfterWatermark.pastEndOfWindow()))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes();
[1]
https://beam.apache.org/documentation/programming-guide/#default-trigger
[2]
https://beam.apache.org/documentation/programming-guide/#triggers
[3]
https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/transforms/windowing/Repeatedly.html
On Mon, Jun 14, 2021 at 07:39 Eddy G <[email protected]> wrote:
> As seen in this image https://imgur.com/a/wrZET97, I'm trying to deal
> with late data (intentionally stopped my consumer so data has been
> accumulating for several days now). Now, with the following Window... I'm
> using Beam 2.27 and Flink 1.12.
>
>
> Window.into(FixedWindows.of(Duration.standardMinutes(10)))
>
> And several parsing stages after, once it's time to write within the
> ParquetIO stage...
>
> FileIO
> .<String, MyClass>writeDynamic()
> .by(...)
> .via(...)
> .to(...)
> .withNaming(...)
> .withDestinationCoder(StringUtf8Coder.of())
> .withNumShards(options.getNumShards())
>
> it won't send bytes across all stages so no data is being written, still
> it accumulates in the first stage seen in the image and won't go further
> than that.
>
> Any reason why this may be happening? Wrong windowing strategy?
>