Re: performances of S3 writing with many buckets in parallel

2020-02-07 Thread Enrico Agnoli
I finally found the time to dig a little more on this and found the real problem. The culprit of the slow-down is this piece of code:

performances of S3 writing with many buckets in parallel

2019-10-15 Thread Enrico Agnoli
Starting here the discussion after an initial discussion with Ververica and AWS teams during FlinkForward. I'm investigating the performances of a Flink job that transports data from Kafka to an S3 Sink. We are using a BucketingSink to write parquet files. The bucketing logic divides the

Re: [DISCUSS] StreamingFile with ParquetBulkWriter bucketing limitations

2019-09-09 Thread Enrico Agnoli
ctory. > > Out of curiosity, I guess that in the BucketingSink you were using the > AvroKeyValueSinkWriter, right? > > Cheers, > Kostas > > On Fri, Aug 30, 2019 at 10:23 AM Enrico Agnoli > wrote: > > > > StreamingFile limitations > > > > Hi

[DISCUSS] StreamingFile with ParquetBulkWriter bucketing limitations

2019-08-30 Thread Enrico Agnoli
StreamingFile limitations Hi community, I'm working toward the porting of our code from `BucketingSink<>` to `StreamingFileSink`. In this case we use the sink to write AVRO via Parquet and the suggested implementation of the Sink should be something like: ``` val parquetWriterFactory =