Hello,

What would the reason be to need only one row group per file? Parquet
files by design can have many row groups.

The ParquetRecordSetWriter won't be able to do this since it is just
given an output stream to write all the records to, which happens to
be the outputstream for one flow file.

-Bryan

On Fri, Mar 19, 2021 at 10:31 AM Vibhath Ileperuma
<vibhatharunapr...@gmail.com> wrote:
>
> Hi all,
>
> I'm developing a NIFI flow to convert a set of csv data to parquet format and 
> upload them to a S3 bucket. I use a 'ConvertRecord' processor with a csv 
> reader and a parquet record set writer to convert data and use a 
> 'PutS3Object' to send it to S3 bucket.
>
> When converting, I need to make sure the parquet row group size is 256 MB and 
> each parquet file contains only one row group. Even Though it is possible to 
> set the row group size in ParquetRecordSetWriter, I couldn't find a way to 
> make sure each parquet file contains only one row group (If a csv file 
> contains data  more than required for a 256MB row group, multiple parquet 
> files should be generated).
>
> I would be grateful if you could suggest a way to do this.
>
> Thanks & Regards
>
> Vibhath Ileperuma
>
>
>

Reply via email to