Issue when extracting a large dataset into CSV files

2021-05-09 Thread Vibhath Ileperuma
Hi All, I'm using NIFI to extract a postgreSQL table into a csv file. I'm using an 'executeSQLRecord' processor with a 'CSVRecordSetWriter' to extract data in csv format and a PutFile processor to write the csv file into the machine. >From a single sql query, around 60 million rows can be fetched.

Re: Issue when extracting a large dataset into CSV files

2021-05-09 Thread Vijay Chhipa
Hi Vibhath, There is this property on the processor Max Rows Per Flow File Per docs: If the value specified is zero, then all rows are returned in a single FlowFile. Which seems to be what is happening in your case. > On May 9, 2021, at 12:11 PM, Vibhath Ileperuma > wrote: > > execu

Re: Issue when extracting a large dataset into CSV files

2021-05-10 Thread Vibhath Ileperuma
Hi Vijay, Eventhough that property is set to a postive value, same issue happens. Seems like NIFI first loads all the rows to RAM and write to multiple files when that property is set to a non-zero value. On the other hand, if this property is set to a very small value, a large no of flow files ca

Re: Issue when extracting a large dataset into CSV files

2021-05-10 Thread Matt Burgess
Vibath, What is the "Fetch Size" property set to? It looks like PostgreSQL will load all results if Fetch Size is set to zero [1]. Try setting it to 1 or something like that, whatever doesn't use too much memory but doesn't slow down the performance too much. Regards, Matt [1] https://jdbc.p

Re: Issue when extracting a large dataset into CSV files

2021-05-11 Thread Vibhath Ileperuma
Hi Matt, I have set the fetch size to 2. I tried with small values like 100. Got the same issue with the small sizes. Further I limited the row count fetched by the sql by adding a limit. Then I didn't got the issue even with large fetch sizes. Thank you. On Mon, 10 May 2021, 8:05 pm Matt B

Re: Issue when extracting a large dataset into CSV files

2021-05-14 Thread Vibhath Ileperuma
Hi All, To further debug this issue, I set the fetch size to 50 and Max rows per file to 100 as shown below image. [image: image.png] I have set the initial heap size to 8GB and Max heap size to 15GB in the bootstrap.conf file. I started monitoring CPU and Memory with a 'top' command before star