Re: GroupIntoShards not sending bytes further when dealing with huge amount of data

Jan Lukavský Mon, 14 Jun 2021 05:49:18 -0700

Hi Eddy,

does your data get buffered in a state - e.g. does the size of the stategrow over time? Do you see watermark being updated in your Flink WebUI?When a stateful operation (and GroupByKey is a stateful operation) doesnot output any data, the first place to look at is if watermarkcorrectly progresses. If it does not progress, then the input data mustbe buffered in state and the size of the state should grow over time. Ifit progresses, then it might be the case, that the data is too lateafter the watermark (the watermark estimator might need tuning) and thedata gets dropped (note you don't set any allowed lateness, which_might_ cause issues). You could see if your pipeline drops data in"droppedDueToLateness" metric. The size of you state would not grow muchin that situation.

Another hint - If you use KafkaIO, try to disable SDF wrapper for itusing "--experiments=use_deprecated_read" on command line (which youthen must pass to PipelineOptionsFactory). There is some suspicion thatSDF wrapper for Kafka might not work as expected in certain situationswith Flink.


Please feel free to share any results,

  Jan

On 6/14/21 1:39 PM, Eddy G wrote:

As seen in this image https://imgur.com/a/wrZET97, I'm trying to deal with late 
data (intentionally stopped my consumer so data has been accumulating for 
several days now). Now, with the following Window... I'm using Beam 2.27 and 
Flink 1.12.

                             
Window.into(FixedWindows.of(Duration.standardMinutes(10)))

And several parsing stages after, once it's time to write within the ParquetIO 
stage...

                             FileIO
                                 .<String, MyClass>writeDynamic()
                                 .by(...)
                                 .via(...)
                                 .to(...)
                                 .withNaming(...)
                                 .withDestinationCoder(StringUtf8Coder.of())
                                 .withNumShards(options.getNumShards())

it won't send bytes across all stages so no data is being written, still it 
accumulates in the first stage seen in the image and won't go further than that.

Any reason why this may be happening? Wrong windowing strategy?

Re: GroupIntoShards not sending bytes further when dealing with huge amount of data

Reply via email to