Re: Choosing compression codec when using parquet file sink

Tiansu Yu Fri, 24 Feb 2023 10:22:39 -0800

Hi, Tim. 

If you look at the doc here 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1
 
<https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1>,
 you just need to write a custom `AvroWriterFactory` method where you could 
pass params such as Codecs to your AvroWriter. Despite the name suggests, it 
applies to the BulkFormats, which includes Parquet files as well. I copy the 
examples below:


```
AvroWriterFactory<?> factory = new AvroWriterFactory<>((AvroBuilder<Address>) 
out -> {
        Schema schema = ReflectData.get().getSchema(Address.class);
        DatumWriter<Address> datumWriter = new ReflectDatumWriter<>(schema);

        DataFileWriter<Address> dataFileWriter = new 
DataFileWriter<>(datumWriter);
        dataFileWriter.setCodec(CodecFactory.snappyCodec());
        dataFileWriter.create(schema, out);
        return dataFileWriter;
});

DataStream<Address> stream = ...
stream.sinkTo(FileSink.forBulkFormat(
        outputBasePath,
        factory).build());
```
Best,
Tiansu 
 

> On 24. 02 2023, at 14:17, Tim Josefsson <tim.josefs...@webstep.se> wrote:
> 
> I'm writing a Flink processor that will read a bunch of JSON records from 
> Kafka and then write them to S3 in parquet format using the FileSink. I've 
> got most things in place, the only thing I haven't been able to figure out is 
> how to change the compression codec used by the writer. Is there any 
> recommended way to do this? Currently I'm using the 
> AvroParquetWriters.forReflectRecord(PlayerEvent.class) to transform my POJOs 
> to Avro and then write them as Parquet files. I've looked into the 
> AvroParquetWriters class but couldn't figure out how to configure the 
> compression codec (or even what codec was used). Is there a way to configure 
> this or do I have to write my own implementation of the Parquet writer and if 
> so, how would one do that?
> 
> Thankful for any help,
> Tim

Re: Choosing compression codec when using parquet file sink

Reply via email to