Re: Hardware requirement for NIFI instance

Juan Pablo Gardella Fri, 05 Jan 2024 07:00:34 -0800

Agree on using sane defaults.

On Fri, Jan 5, 2024 at 11:52 AM Mark Payne <marka...@hotmail.com> wrote:


> Thanks for following up. That actually makes sense. I don’t think Output
> Batch Size will play a very big role here. But Fetch Size, if I understand
> correctly, is essentially telling the JDBC Driver “Here’s how many rows you
> should pull back at once.” And so it’s going to buffer all of those rows
> into memory until it has written out all of them.
>
> So if you set Fetch Size = 0, it’s going to pull back all rows in your
> database into memory. To be honest, I cannot imagine a single scenario
> where that’s desirable. We should probably set the default to something
> reasonable like 1,000 or 10,000 at most. And in 2.0, where we have the
> ability to migrate old configurations we should automatically change any
> config that has Fetch Size of 0 to the default value.
>
> @Matt Burgess, et al., any concerns with that?
>
> Thanks
> -Mark
>
>
> On Jan 5, 2024, at 9:45 AM, e-soci...@gmx.fr wrote:
>
> So after some tests, here the result perhaps could help someone.
>
> With nifi (2CPU / 8Go Ram)
>
> I have tested with these couples properties :
>
> > 1 executeSQL with "select * from table"
> Output Batch Size : 1
> Fetch Size : 100000
>
> > 2 executeSQL with "select * from table"
> Output Batch Size : 1
> Fetch Size : 200000
>
> > 2 executeSQL with "select * from table"
> Output Batch Size : 1
> Fetch Size : 400000
> and started 5 executeSQL in the same time
>
> The 5 processors work perfectly and receive 5 avro files with same size.
> And during the test, the memory is stable and the Web UI works perfectly
>
>
> FAILED TEST "OUT OF MEMORY" if the properties are :
>
> > 1 executeSQL with "select * from table"
> Output Batch Size : 0
> Fetch Size : 0
> Regards
>
>
> *Envoyé:* vendredi 5 janvier 2024 à 08:12
> *De:* "Matt Burgess" <mattyb...@apache.org>
> *À:* users@nifi.apache.org
> *Objet:* Re: Hardware requirement for NIFI instance
> You may not need to merge if your Fetch Size is set appropriately. For
> your case I don't recommend setting Max Rows Per Flow File because you
> still have to wait for all the results to be processed before the
> FlowFile(s) get sent "downstream". Also if you set Output Batch Size
> you can't use Merge downstream as ExecuteSQL will send FlowFiles
> downstream before it knows the total count.
>
> If you have a NiFi cluster and not a standalone instance you MIGHT be
> able to represent your complex query using GenerateTableFetch and use
> a load-balanced connection to grab different "pages" of the table in
> parallel with ExecuteSQL. Those can be merged later as long as you get
> all the FlowFiles back to a single node. Depending on how complex your
> query is then it's a long shot but I thought I'd mention it just in
> case.
>
> Regards,
> Matt
>
>
> On Thu, Jan 4, 2024 at 1:41 PM Pierre Villard
> <pierre.villard...@gmail.com> wrote:
> >
> > You can merge multiple Avro flow files with MergeRecord with an Avro
> Reader and an Avro Writer
> >
> > Le jeu. 4 janv. 2024 à 22:05, <e-soci...@gmx.fr> a écrit :
> >>
> >> And the important thing for us it has only one avro file by table.
> >>
> >> So it is possible to merge avro files to one avro file ?
> >>
> >> Regards
> >>
> >>
> >> Envoyé: jeudi 4 janvier 2024 à 19:01
> >> De: e-soci...@gmx.fr
> >> À: users@nifi.apache.org
> >> Cc: users@nifi.apache.org
> >> Objet: Re: Hardware requirement for NIFI instance
> >>
> >> Hello all,
> >>
> >> Thanks a lot for the reply.
> >>
> >> So for more details.
> >>
> >> All the properties for the ExecuteSQL are set by default, except "Set
> Auto Commit: false".
> >>
> >> The sql command could not be more simple than "select * from
> ${db.table.fullname}"
> >>
> >> The nifi version is 1.16.3 and 1.23.2
> >>
> >> I have also test the same sql command in the another nifi (8 cores/ 16G
> Ram) and it is working.
> >> The result is the avro file with 1.6GB
> >>
> >> The detail about the output flowfile :
> >>
> >> executesql.query.duration
> >> 245118
> >> executesql.query.executiontime
> >> 64122
> >> executesql.query.fetchtime
> >> 180996
> >> executesql.resultset.index
> >> 0
> >> executesql.row.count
> >> 14961077
> >>
> >> File Size
> >> 1.62 GB
> >>
> >> Regards
> >>
> >> Minh
> >>
> >>
> >> Envoyé: jeudi 4 janvier 2024 à 17:18
> >> De: "Matt Burgess" <mattyb...@apache.org>
> >> À: users@nifi.apache.org
> >> Objet: Re: Hardware requirement for NIFI instance
> >> If I remember correctly, the default Fetch Size for Postgresql is to
> >> get all the rows at once, which can certainly cause the problem.
> >> Perhaps try setting Fetch Size to something like 1000 or so and see if
> >> that alleviates the problem.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Thu, Jan 4, 2024 at 8:48 AM Etienne Jouvin <lapinoujou...@gmail.com>
> wrote:
> >> >
> >> > Hello.
> >> >
> >> > I also think the problem is more about the processor, I guess
> ExecuteSQL.
> >> >
> >> > Should play with batch configuration and commit flag to commit
> intermediate FlowFile.
> >> >
> >> > The out of memory exception makes me believe the full table is
> retrieved, and if it is huge the FlowFile content is very large.
> >> >
> >> >
> >> >
> >> >
> >> > Le jeu. 4 janv. 2024 à 14:37, Pierre Villard <
> pierre.villard...@gmail.com> a écrit :
> >> >>
> >> >> It should be memory efficient so I think this is likely a
> configuration aspect of your processor. Can you share the configuration for
> all properties?
> >> >> As a side note: if NiFi ran out of memory, you'd always want to
> restart it because you are never sure what's the state of the JVM after an
> OOME.
> >> >>
> >> >> Le jeu. 4 janv. 2024 à 17:26, <e-soci...@gmx.fr> a écrit :
> >> >>>
> >> >>>
> >> >>> Hello all,
> >> >>>
> >> >>> Who could help me to determine the cpu/memory need for nifi
> instance to fetch the data from Postgresql hosted in google ?
> >> >>>
> >> >>> We got this error :
> >> >>> ==> Error : executesql.error.message
> >> >>> Ran out of memory retrieving query results.
> >> >>>
> >> >>> The procesor ExecuteSQL has this config : Set Auto Commit ==> false
> >> >>> driver Jar to use : postgresql-42.7.1.jar
> >> >>> Java version : jdk-11.0.19
> >> >>>
> >> >>> Table information :
> >> >>> rows number : 14958836
> >> >>> fields number : 20
> >> >>>
> >> >>> Linux Rocky8
> >> >>>
> >> >>> Architecture: x86_64
> >> >>> CPU op-mode(s): 32-bit, 64-bit
> >> >>> Byte Order: Little Endian
> >> >>> CPU(s): 2
> >> >>> On-line CPU(s) list: 0,1
> >> >>> Thread(s) per core: 2
> >> >>> Core(s) per socket: 1
> >> >>> Socket(s): 1
> >> >>> NUMA node(s): 1
> >> >>> Vendor ID: GenuineIntel
> >> >>> BIOS Vendor ID: Google
> >> >>> CPU family: 6
> >> >>> Model: 85
> >> >>> Model name: Intel(R) Xeon(R) CPU @ 2.80GHz
> >> >>> Stepping: 7
> >> >>> CPU MHz: 2800.286
> >> >>> BogoMIPS: 5600.57
> >> >>> Hypervisor vendor: KVM
> >> >>> Virtualization type: full
> >> >>> L1d cache: 32K
> >> >>> L1i cache: 32K
> >> >>> L2 cache: 1024K
> >> >>> L3 cache: 33792K
> >> >>> NUMA node0 CPU(s): 0,1
> >> >>>
> >> >>> Memory : 8GB
> >> >>>
> >> >>> Thanks for you helps
> >> >>>
> >> >>> Minh
> >>
> >>
>
>
>
>
>

Re: Hardware requirement for NIFI instance

Reply via email to