Agree on using sane defaults. On Fri, Jan 5, 2024 at 11:52 AM Mark Payne <marka...@hotmail.com> wrote:
> Thanks for following up. That actually makes sense. I don’t think Output > Batch Size will play a very big role here. But Fetch Size, if I understand > correctly, is essentially telling the JDBC Driver “Here’s how many rows you > should pull back at once.” And so it’s going to buffer all of those rows > into memory until it has written out all of them. > > So if you set Fetch Size = 0, it’s going to pull back all rows in your > database into memory. To be honest, I cannot imagine a single scenario > where that’s desirable. We should probably set the default to something > reasonable like 1,000 or 10,000 at most. And in 2.0, where we have the > ability to migrate old configurations we should automatically change any > config that has Fetch Size of 0 to the default value. > > @Matt Burgess, et al., any concerns with that? > > Thanks > -Mark > > > On Jan 5, 2024, at 9:45 AM, e-soci...@gmx.fr wrote: > > So after some tests, here the result perhaps could help someone. > > With nifi (2CPU / 8Go Ram) > > I have tested with these couples properties : > > > 1 executeSQL with "select * from table" > Output Batch Size : 1 > Fetch Size : 100000 > > > 2 executeSQL with "select * from table" > Output Batch Size : 1 > Fetch Size : 200000 > > > 2 executeSQL with "select * from table" > Output Batch Size : 1 > Fetch Size : 400000 > and started 5 executeSQL in the same time > > The 5 processors work perfectly and receive 5 avro files with same size. > And during the test, the memory is stable and the Web UI works perfectly > > > FAILED TEST "OUT OF MEMORY" if the properties are : > > > 1 executeSQL with "select * from table" > Output Batch Size : 0 > Fetch Size : 0 > Regards > > > *Envoyé:* vendredi 5 janvier 2024 à 08:12 > *De:* "Matt Burgess" <mattyb...@apache.org> > *À:* users@nifi.apache.org > *Objet:* Re: Hardware requirement for NIFI instance > You may not need to merge if your Fetch Size is set appropriately. For > your case I don't recommend setting Max Rows Per Flow File because you > still have to wait for all the results to be processed before the > FlowFile(s) get sent "downstream". Also if you set Output Batch Size > you can't use Merge downstream as ExecuteSQL will send FlowFiles > downstream before it knows the total count. > > If you have a NiFi cluster and not a standalone instance you MIGHT be > able to represent your complex query using GenerateTableFetch and use > a load-balanced connection to grab different "pages" of the table in > parallel with ExecuteSQL. Those can be merged later as long as you get > all the FlowFiles back to a single node. Depending on how complex your > query is then it's a long shot but I thought I'd mention it just in > case. > > Regards, > Matt > > > On Thu, Jan 4, 2024 at 1:41 PM Pierre Villard > <pierre.villard...@gmail.com> wrote: > > > > You can merge multiple Avro flow files with MergeRecord with an Avro > Reader and an Avro Writer > > > > Le jeu. 4 janv. 2024 à 22:05, <e-soci...@gmx.fr> a écrit : > >> > >> And the important thing for us it has only one avro file by table. > >> > >> So it is possible to merge avro files to one avro file ? > >> > >> Regards > >> > >> > >> Envoyé: jeudi 4 janvier 2024 à 19:01 > >> De: e-soci...@gmx.fr > >> À: users@nifi.apache.org > >> Cc: users@nifi.apache.org > >> Objet: Re: Hardware requirement for NIFI instance > >> > >> Hello all, > >> > >> Thanks a lot for the reply. > >> > >> So for more details. > >> > >> All the properties for the ExecuteSQL are set by default, except "Set > Auto Commit: false". > >> > >> The sql command could not be more simple than "select * from > ${db.table.fullname}" > >> > >> The nifi version is 1.16.3 and 1.23.2 > >> > >> I have also test the same sql command in the another nifi (8 cores/ 16G > Ram) and it is working. > >> The result is the avro file with 1.6GB > >> > >> The detail about the output flowfile : > >> > >> executesql.query.duration > >> 245118 > >> executesql.query.executiontime > >> 64122 > >> executesql.query.fetchtime > >> 180996 > >> executesql.resultset.index > >> 0 > >> executesql.row.count > >> 14961077 > >> > >> File Size > >> 1.62 GB > >> > >> Regards > >> > >> Minh > >> > >> > >> Envoyé: jeudi 4 janvier 2024 à 17:18 > >> De: "Matt Burgess" <mattyb...@apache.org> > >> À: users@nifi.apache.org > >> Objet: Re: Hardware requirement for NIFI instance > >> If I remember correctly, the default Fetch Size for Postgresql is to > >> get all the rows at once, which can certainly cause the problem. > >> Perhaps try setting Fetch Size to something like 1000 or so and see if > >> that alleviates the problem. > >> > >> Regards, > >> Matt > >> > >> On Thu, Jan 4, 2024 at 8:48 AM Etienne Jouvin <lapinoujou...@gmail.com> > wrote: > >> > > >> > Hello. > >> > > >> > I also think the problem is more about the processor, I guess > ExecuteSQL. > >> > > >> > Should play with batch configuration and commit flag to commit > intermediate FlowFile. > >> > > >> > The out of memory exception makes me believe the full table is > retrieved, and if it is huge the FlowFile content is very large. > >> > > >> > > >> > > >> > > >> > Le jeu. 4 janv. 2024 à 14:37, Pierre Villard < > pierre.villard...@gmail.com> a écrit : > >> >> > >> >> It should be memory efficient so I think this is likely a > configuration aspect of your processor. Can you share the configuration for > all properties? > >> >> As a side note: if NiFi ran out of memory, you'd always want to > restart it because you are never sure what's the state of the JVM after an > OOME. > >> >> > >> >> Le jeu. 4 janv. 2024 à 17:26, <e-soci...@gmx.fr> a écrit : > >> >>> > >> >>> > >> >>> Hello all, > >> >>> > >> >>> Who could help me to determine the cpu/memory need for nifi > instance to fetch the data from Postgresql hosted in google ? > >> >>> > >> >>> We got this error : > >> >>> ==> Error : executesql.error.message > >> >>> Ran out of memory retrieving query results. > >> >>> > >> >>> The procesor ExecuteSQL has this config : Set Auto Commit ==> false > >> >>> driver Jar to use : postgresql-42.7.1.jar > >> >>> Java version : jdk-11.0.19 > >> >>> > >> >>> Table information : > >> >>> rows number : 14958836 > >> >>> fields number : 20 > >> >>> > >> >>> Linux Rocky8 > >> >>> > >> >>> Architecture: x86_64 > >> >>> CPU op-mode(s): 32-bit, 64-bit > >> >>> Byte Order: Little Endian > >> >>> CPU(s): 2 > >> >>> On-line CPU(s) list: 0,1 > >> >>> Thread(s) per core: 2 > >> >>> Core(s) per socket: 1 > >> >>> Socket(s): 1 > >> >>> NUMA node(s): 1 > >> >>> Vendor ID: GenuineIntel > >> >>> BIOS Vendor ID: Google > >> >>> CPU family: 6 > >> >>> Model: 85 > >> >>> Model name: Intel(R) Xeon(R) CPU @ 2.80GHz > >> >>> Stepping: 7 > >> >>> CPU MHz: 2800.286 > >> >>> BogoMIPS: 5600.57 > >> >>> Hypervisor vendor: KVM > >> >>> Virtualization type: full > >> >>> L1d cache: 32K > >> >>> L1i cache: 32K > >> >>> L2 cache: 1024K > >> >>> L3 cache: 33792K > >> >>> NUMA node0 CPU(s): 0,1 > >> >>> > >> >>> Memory : 8GB > >> >>> > >> >>> Thanks for you helps > >> >>> > >> >>> Minh > >> > >> > > > > >