[DISCUSS] StreamingFile with ParquetBulkWriter bucketing limitations

2019-08-30 Thread Enrico Agnoli
StreamingFile limitations Hi community, I'm working toward the porting of our code from `BucketingSink<>` to `StreamingFileSink`. In this case we use the sink to write AVRO via Parquet and the suggested implementation of the Sink should be something like: ``` val parquetWriterFactory = Parquet

Re: [DISCUSS] StreamingFile with ParquetBulkWriter bucketing limitations

2019-09-09 Thread Kostas Kloudas
Hi Enrico, Sorry for the late reply. I think your understanding is correct. The best way to do it is to write your own ParquetBulkWriter and the corresponding factory. Out of curiosity, I guess that in the BucketingSink you were using the AvroKeyValueSinkWriter, right? Cheers, Kostas On Fri, Au

Re: [DISCUSS] StreamingFile with ParquetBulkWriter bucketing limitations

2019-09-09 Thread Enrico Agnoli
Thanks for confirming. We have a ``` public class ParquetSinkWriter implements Writer ``` that handles the serialization of the data. We implemented it starting from: https://medium.com/hadoop-noob/flink-parquet-writer-d127f745b519 https://stackoverflow.com/questions/48098011/how-to-use-apache-fli