Re: Writing parquet files to S3

2021-03-22 Thread Joe Witt
Not responding to the real question in the thread but "I'm using NIFI 1.13.1.". Please switch to 1.13.2 right away due to a regression in 1.13.1 On Mon, Mar 22, 2021 at 12:24 AM Vibhath Ileperuma wrote: > > Hi Bryan, > > I'm planning to add these generated parquet files to an impala S3 table. >

Re: Writing parquet files to S3

2021-03-22 Thread Vibhath Ileperuma
Hi Bryan, I'm planning to add these generated parquet files to an impala S3 table. I noticed that impala written parquet files contain only one row group. That's why I'm trying to write one row group per file. However, I tried to create small parquet files (Snappy compressed) first and use a Merg

Re: Writing parquet files to S3

2021-03-19 Thread Bryan Bende
Hello, What would the reason be to need only one row group per file? Parquet files by design can have many row groups. The ParquetRecordSetWriter won't be able to do this since it is just given an output stream to write all the records to, which happens to be the outputstream for one flow file.

Writing parquet files to S3

2021-03-19 Thread Vibhath Ileperuma
Hi all, I'm developing a NIFI flow to convert a set of csv data to parquet format and upload them to a S3 bucket. I use a 'ConvertRecord' processor with a csv reader and a parquet record set writer to convert data and use a 'PutS3Object' to send it to S3 bucket. When converting, I need to make su

Writing parquet files to S3

2021-03-19 Thread Vibhath Ileperuma
Hi all, I'm developing a NIFI flow to convert a set of csv data to parquet format and upload them to a S3 bucket. I use a 'ConvertRecord' processor with a csv reader and a parquet record set writer to convert data and use a 'PutS3Object' to send it to S3 bucket. When converting, I need to make su