You may not need to merge if your Fetch Size is set appropriately. For
your case I don't recommend setting Max Rows Per Flow File because you
still have to wait for all the results to be processed before the
FlowFile(s) get sent "downstream". Also if you set Output Batch Size
you can't use Merge downstream as ExecuteSQL will send FlowFiles
downstream before it knows the total count.

If you have a NiFi cluster and not a standalone instance you MIGHT be
able to represent your complex query using GenerateTableFetch and use
a load-balanced connection to grab different "pages" of the table in
parallel with ExecuteSQL. Those can be merged later as long as you get
all the FlowFiles back to a single node. Depending on how complex your
query is then it's a long shot but I thought I'd mention it just in
case.

Regards,
Matt


On Thu, Jan 4, 2024 at 1:41 PM Pierre Villard
<pierre.villard...@gmail.com> wrote:
>
> You can merge multiple Avro flow files with MergeRecord with an Avro Reader 
> and an Avro Writer
>
> Le jeu. 4 janv. 2024 à 22:05, <e-soci...@gmx.fr> a écrit :
>>
>> And the important thing for us it has only one avro file by table.
>>
>> So it is possible to merge avro files to one avro file ?
>>
>> Regards
>>
>>
>> Envoyé: jeudi 4 janvier 2024 à 19:01
>> De: e-soci...@gmx.fr
>> À: users@nifi.apache.org
>> Cc: users@nifi.apache.org
>> Objet: Re: Hardware requirement for NIFI instance
>>
>> Hello all,
>>
>> Thanks a lot for the reply.
>>
>> So for more details.
>>
>> All the properties for the ExecuteSQL are set by default, except "Set Auto 
>> Commit:  false".
>>
>> The sql command could not be more simple than "select * from 
>> ${db.table.fullname}"
>>
>> The nifi version is 1.16.3 and 1.23.2
>>
>> I have also test the same sql command in the another nifi (8 cores/ 16G Ram) 
>> and it is working.
>> The result is the avro file with 1.6GB
>>
>> The detail about the output flowfile :
>>
>> executesql.query.duration
>> 245118
>> executesql.query.executiontime
>> 64122
>> executesql.query.fetchtime
>> 180996
>> executesql.resultset.index
>> 0
>> executesql.row.count
>> 14961077
>>
>> File Size
>> 1.62 GB
>>
>> Regards
>>
>> Minh
>>
>>
>> Envoyé: jeudi 4 janvier 2024 à 17:18
>> De: "Matt Burgess" <mattyb...@apache.org>
>> À: users@nifi.apache.org
>> Objet: Re: Hardware requirement for NIFI instance
>> If I remember correctly, the default Fetch Size for Postgresql is to
>> get all the rows at once, which can certainly cause the problem.
>> Perhaps try setting Fetch Size to something like 1000 or so and see if
>> that alleviates the problem.
>>
>> Regards,
>> Matt
>>
>> On Thu, Jan 4, 2024 at 8:48 AM Etienne Jouvin <lapinoujou...@gmail.com> 
>> wrote:
>> >
>> > Hello.
>> >
>> > I also think the problem is more about the processor, I guess ExecuteSQL.
>> >
>> > Should play with batch configuration and commit flag to commit 
>> > intermediate FlowFile.
>> >
>> > The out of memory exception makes me believe the full table is retrieved, 
>> > and if it is huge the FlowFile content is very large.
>> >
>> >
>> >
>> >
>> > Le jeu. 4 janv. 2024 à 14:37, Pierre Villard <pierre.villard...@gmail.com> 
>> > a écrit :
>> >>
>> >> It should be memory efficient so I think this is likely a configuration 
>> >> aspect of your processor. Can you share the configuration for all 
>> >> properties?
>> >> As a side note: if NiFi ran out of memory, you'd always want to restart 
>> >> it because you are never sure what's the state of the JVM after an OOME.
>> >>
>> >> Le jeu. 4 janv. 2024 à 17:26, <e-soci...@gmx.fr> a écrit :
>> >>>
>> >>>
>> >>> Hello all,
>> >>>
>> >>> Who could help me to determine the cpu/memory need for nifi instance to 
>> >>> fetch the data from Postgresql hosted in google ?
>> >>>
>> >>> We got this error :
>> >>> ==> Error : executesql.error.message
>> >>> Ran out of memory retrieving query results.
>> >>>
>> >>> The procesor ExecuteSQL has this config : Set Auto Commit ==> false
>> >>> driver Jar to use : postgresql-42.7.1.jar
>> >>> Java version : jdk-11.0.19
>> >>>
>> >>> Table information :
>> >>> rows number : 14958836
>> >>> fields number : 20
>> >>>
>> >>> Linux Rocky8
>> >>>
>> >>> Architecture: x86_64
>> >>> CPU op-mode(s): 32-bit, 64-bit
>> >>> Byte Order: Little Endian
>> >>> CPU(s): 2
>> >>> On-line CPU(s) list: 0,1
>> >>> Thread(s) per core: 2
>> >>> Core(s) per socket: 1
>> >>> Socket(s): 1
>> >>> NUMA node(s): 1
>> >>> Vendor ID: GenuineIntel
>> >>> BIOS Vendor ID: Google
>> >>> CPU family: 6
>> >>> Model: 85
>> >>> Model name: Intel(R) Xeon(R) CPU @ 2.80GHz
>> >>> Stepping: 7
>> >>> CPU MHz: 2800.286
>> >>> BogoMIPS: 5600.57
>> >>> Hypervisor vendor: KVM
>> >>> Virtualization type: full
>> >>> L1d cache: 32K
>> >>> L1i cache: 32K
>> >>> L2 cache: 1024K
>> >>> L3 cache: 33792K
>> >>> NUMA node0 CPU(s): 0,1
>> >>>
>> >>> Memory : 8GB
>> >>>
>> >>> Thanks for you helps
>> >>>
>> >>> Minh
>>
>>

Reply via email to