I finally found the time to dig a little more on this and found the real
problem.
The culprit of the slow-down is this piece of code:
Starting here the discussion after an initial discussion with Ververica and AWS
teams during FlinkForward.
I'm investigating the performances of a Flink job that transports data from
Kafka to an S3 Sink.
We are using a BucketingSink to write parquet files. The bucketing logic
divides the
ctory.
>
> Out of curiosity, I guess that in the BucketingSink you were using the
> AvroKeyValueSinkWriter, right?
>
> Cheers,
> Kostas
>
> On Fri, Aug 30, 2019 at 10:23 AM Enrico Agnoli
> wrote:
> >
> > StreamingFile limitations
> >
> > Hi
StreamingFile limitations
Hi community,
I'm working toward the porting of our code from `BucketingSink<>` to
`StreamingFileSink`.
In this case we use the sink to write AVRO via Parquet and the suggested
implementation of the Sink should be something like:
```
val parquetWriterFactory =