Hi Eddy,

does your data get buffered in a state - e.g. does the size of the state grow over time? Do you see watermark being updated in your Flink WebUI? When a stateful operation (and GroupByKey is a stateful operation) does not output any data, the first place to look at is if watermark correctly progresses. If it does not progress, then the input data must be buffered in state and the size of the state should grow over time. If it progresses, then it might be the case, that the data is too late after the watermark (the watermark estimator might need tuning) and the data gets dropped (note you don't set any allowed lateness, which _might_ cause issues). You could see if your pipeline drops data in "droppedDueToLateness" metric. The size of you state would not grow much in that situation.

Another hint - If you use KafkaIO, try to disable SDF wrapper for it using "--experiments=use_deprecated_read" on command line (which you then must pass to PipelineOptionsFactory). There is some suspicion that SDF wrapper for Kafka might not work as expected in certain situations with Flink.

Please feel free to share any results,

  Jan

On 6/14/21 1:39 PM, Eddy G wrote:
As seen in this image https://imgur.com/a/wrZET97, I'm trying to deal with late 
data (intentionally stopped my consumer so data has been accumulating for 
several days now). Now, with the following Window... I'm using Beam 2.27 and 
Flink 1.12.

                             
Window.into(FixedWindows.of(Duration.standardMinutes(10)))

And several parsing stages after, once it's time to write within the ParquetIO 
stage...

                             FileIO
                                 .<String, MyClass>writeDynamic()
                                 .by(...)
                                 .via(...)
                                 .to(...)
                                 .withNaming(...)
                                 .withDestinationCoder(StringUtf8Coder.of())
                                 .withNumShards(options.getNumShards())

it won't send bytes across all stages so no data is being written, still it 
accumulates in the first stage seen in the image and won't go further than that.

Any reason why this may be happening? Wrong windowing strategy?

Reply via email to