Hi all,

I'm developing a NIFI flow to convert a set of csv data to parquet format
and upload them to a S3 bucket. I use a 'ConvertRecord' processor with a
csv reader and a parquet record set writer to convert data and use a
'PutS3Object' to send it to S3 bucket.

When converting, I need to make sure the parquet row group size is 256 MB
and each parquet file contains only one row group. Even Though it is
possible to set the row group size in ParquetRecordSetWriter, I couldn't
find a way to make sure each parquet file contains only one row group (If a
csv file contains data  more than required for a 256MB row group, multiple
parquet files should be generated).

I would be grateful if you could suggest a way to do this.

Thanks & Regards

*Vibhath Ileperuma*

Reply via email to