Hello Mark,
 
I shared the screenshot about ExecuteSQL properties.
 
This 3 properties together work very well for PostGRESQL in the NIFI with 2CPU/8Gb.
 
Without the properties "Set Auto Commit" to FALSE, the NIFI stuck and crash.
 
Thanks a lot for help
 
Minh 
 
 
 
Envoyé: vendredi 5 janvier 2024 à 15:52
De: "Mark Payne" <marka...@hotmail.com>
À: "users@nifi.apache.org" <users@nifi.apache.org>
Objet: Re: Hardware requirement for NIFI instance
Thanks for following up. That actually makes sense. I don’t think Output Batch Size will play a very big role here. But Fetch Size, if I understand correctly, is essentially telling the JDBC Driver “Here’s how many rows you should pull back at once.” And so it’s going to buffer all of those rows into memory until it has written out all of them.
 
So if you set Fetch Size = 0, it’s going to pull back all rows in your database into memory. To be honest, I cannot imagine a single scenario where that’s desirable. We should probably set the default to something reasonable like 1,000 or 10,000 at most. And in 2.0, where we have the ability to migrate old configurations we should automatically change any config that has Fetch Size of 0 to the default value.
 
@Matt Burgess, et al., any concerns with that?
 
Thanks
-Mark
 
 
On Jan 5, 2024, at 9:45 AM, e-soci...@gmx.fr wrote:
 
So after some tests, here the result perhaps could help someone. 
 
With nifi (2CPU / 8Go Ram)

I have tested with these couples properties :
 
> 1 executeSQL with "select * from table"
Output Batch Size : 1
Fetch Size : 100000 
 
> 2 executeSQL with "select * from table"
Output Batch Size : 1
Fetch Size : 200000 
 
> 2 executeSQL with "select * from table"
Output Batch Size : 1
Fetch Size : 400000 
and started 5 executeSQL in the same time
 
The 5 processors work perfectly and receive 5 avro files with same size.
And during the test, the memory is stable and the Web UI works perfectly
 
 
FAILED TEST "OUT OF MEMORY" if the properties are :
 
> 1 executeSQL with "select * from table"
Output Batch Size : 0
Fetch Size : 0
Regards 
 
 
Envoyé: vendredi 5 janvier 2024 à 08:12
De: "Matt Burgess" <mattyb...@apache.org>
À: users@nifi.apache.org
Objet: Re: Hardware requirement for NIFI instance
You may not need to merge if your Fetch Size is set appropriately. For
your case I don't recommend setting Max Rows Per Flow File because you
still have to wait for all the results to be processed before the
FlowFile(s) get sent "downstream". Also if you set Output Batch Size
you can't use Merge downstream as ExecuteSQL will send FlowFiles
downstream before it knows the total count.

If you have a NiFi cluster and not a standalone instance you MIGHT be
able to represent your complex query using GenerateTableFetch and use
a load-balanced connection to grab different "pages" of the table in
parallel with ExecuteSQL. Those can be merged later as long as you get
all the FlowFiles back to a single node. Depending on how complex your
query is then it's a long shot but I thought I'd mention it just in
case.

Regards,
Matt


On Thu, Jan 4, 2024 at 1:41 PM Pierre Villard
<pierre.villard...@gmail.com> wrote:
>
> You can merge multiple Avro flow files with MergeRecord with an Avro Reader and an Avro Writer
>
> Le jeu. 4 janv. 2024 à 22:05, <e-soci...@gmx.fr> a écrit :
>>
>> And the important thing for us it has only one avro file by table.
>>
>> So it is possible to merge avro files to one avro file ?
>>
>> Regards
>>
>>
>> Envoyé: jeudi 4 janvier 2024 à 19:01
>> De: e-soci...@gmx.fr
>> À: users@nifi.apache.org
>> Cc: users@nifi.apache.org
>> Objet: Re: Hardware requirement for NIFI instance
>>
>> Hello all,
>>
>> Thanks a lot for the reply.
>>
>> So for more details.
>>
>> All the properties for the ExecuteSQL are set by default, except "Set Auto Commit: false".
>>
>> The sql command could not be more simple than "select * from ${db.table.fullname}"
>>
>> The nifi version is 1.16.3 and 1.23.2
>>
>> I have also test the same sql command in the another nifi (8 cores/ 16G Ram) and it is working.
>> The result is the avro file with 1.6GB
>>
>> The detail about the output flowfile :
>>
>> executesql.query.duration
>> 245118
>> executesql.query.executiontime
>> 64122
>> executesql.query.fetchtime
>> 180996
>> executesql.resultset.index
>> 0
>> executesql.row.count
>> 14961077
>>
>> File Size
>> 1.62 GB
>>
>> Regards
>>
>> Minh
>>
>>
>> Envoyé: jeudi 4 janvier 2024 à 17:18
>> De: "Matt Burgess" <mattyb...@apache.org>
>> À: users@nifi.apache.org
>> Objet: Re: Hardware requirement for NIFI instance
>> If I remember correctly, the default Fetch Size for Postgresql is to
>> get all the rows at once, which can certainly cause the problem.
>> Perhaps try setting Fetch Size to something like 1000 or so and see if
>> that alleviates the problem.
>>
>> Regards,
>> Matt
>>
>> On Thu, Jan 4, 2024 at 8:48 AM Etienne Jouvin <lapinoujou...@gmail.com> wrote:
>> >
>> > Hello.
>> >
>> > I also think the problem is more about the processor, I guess ExecuteSQL.
>> >
>> > Should play with batch configuration and commit flag to commit intermediate FlowFile.
>> >
>> > The out of memory exception makes me believe the full table is retrieved, and if it is huge the FlowFile content is very large.
>> >
>> >
>> >
>> >
>> > Le jeu. 4 janv. 2024 à 14:37, Pierre Villard <pierre.villard...@gmail.com> a écrit :
>> >>
>> >> It should be memory efficient so I think this is likely a configuration aspect of your processor. Can you share the configuration for all properties?
>> >> As a side note: if NiFi ran out of memory, you'd always want to restart it because you are never sure what's the state of the JVM after an OOME.
>> >>
>> >> Le jeu. 4 janv. 2024 à 17:26, <e-soci...@gmx.fr> a écrit :
>> >>>
>> >>>
>> >>> Hello all,
>> >>>
>> >>> Who could help me to determine the cpu/memory need for nifi instance to fetch the data from Postgresql hosted in google ?
>> >>>
>> >>> We got this error :
>> >>> ==> Error : executesql.error.message
>> >>> Ran out of memory retrieving query results.
>> >>>
>> >>> The procesor ExecuteSQL has this config : Set Auto Commit ==> false
>> >>> driver Jar to use : postgresql-42.7.1.jar
>> >>> Java version : jdk-11.0.19
>> >>>
>> >>> Table information :
>> >>> rows number : 14958836
>> >>> fields number : 20
>> >>>
>> >>> Linux Rocky8
>> >>>
>> >>> Architecture: x86_64
>> >>> CPU op-mode(s): 32-bit, 64-bit
>> >>> Byte Order: Little Endian
>> >>> CPU(s): 2
>> >>> On-line CPU(s) list: 0,1
>> >>> Thread(s) per core: 2
>> >>> Core(s) per socket: 1
>> >>> Socket(s): 1
>> >>> NUMA node(s): 1
>> >>> Vendor ID: GenuineIntel
>> >>> BIOS Vendor ID: Google
>> >>> CPU family: 6
>> >>> Model: 85
>> >>> Model name: Intel(R) Xeon(R) CPU @ 2.80GHz
>> >>> Stepping: 7
>> >>> CPU MHz: 2800.286
>> >>> BogoMIPS: 5600.57
>> >>> Hypervisor vendor: KVM
>> >>> Virtualization type: full
>> >>> L1d cache: 32K
>> >>> L1i cache: 32K
>> >>> L2 cache: 1024K
>> >>> L3 cache: 33792K
>> >>> NUMA node0 CPU(s): 0,1
>> >>>
>> >>> Memory : 8GB
>> >>>
>> >>> Thanks for you helps
>> >>>
>> >>> Minh
>>
>>
 
 

Reply via email to