For very high performance flows it might be useful to have run schedule = 0s and run duration = 25ms. This basically tells the framework to take as many flow files as possible for 25ms when the processor gets a thread to run. Longer run duration is usually not proving useful as you'd block threads for a long period of time and impact other processors in the flow.
Le ven. 8 nov. 2024 à 10:32, <[email protected]> a écrit : > Thanks of lot ! > > You make my day with these videos. > > [3] if the "run schedule" = 0 seconds, we don't need to change the "run > Duration" value, right ? > > Thanks > > Minh > > > *Envoyé:* jeudi 7 novembre 2024 à 18:12 > *De:* "Mark Payne" <[email protected]> > *À:* "[email protected]" <[email protected]> > *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space > OK so given that, the issue is almost certainly because you’re promoting > huge chunks of JSON into attributes using EvaluateJsonPath. > You’ll want to avoid putting anything larger than a few hundred characters > into attributes. Instead, lean into using Record-based processors > In order to manipulate the contents of the FlowFiles as they are, without > creating attributes from content. EvaluateJsonPath is helpful for creating > attributes on small JSON Fields so that you can perform routing, etc. but > should not be used to create large attributes. [1] > > I also see in your canvas that you have several load-balanced connections, > which you should avoid [2]. > > Re: the relationship between “Run Schedule” and “Run Duration” - Run > Schedule indicates how long to wait between triggering the Processor. Run > Duration says how long to run the Processor each time it’s scheduled to > run. So if Run Schedule = 5 seconds and Run Duration = 2 seconds, then the > Processor will run for up to 2 seconds. Then it will not run again for 5 > seconds. Then it will run for 2 seconds. Then it will do nothing for 5 > seconds. In practice, Processors should almost always have a Run Schedule > of 0 seconds except for source processors. See [3] for more details. > > Thanks > -Mark > > [1] https://www.youtube.com/watch?v=RjWstt7nRVY&t=187 > [2] https://www.youtube.com/watch?v=by9P0Zi8Dk8 > [3] https://www.youtube.com/watch?v=pZq0EbfDBy4 > > > > > On Nov 7, 2024, at 3:49 AM, [email protected] wrote: > > Here the configuration for EvaluteJsonPath and ReplaceText > > Another question about "Run Schedule" and "Run Duration" > In separately feature I know how each of them is working but how they do > to work together ? > > I mean, if "Run Schedule" is setup to 0s and "Run Duration" is setup to > 2s. > It means the processor always running ? > How does the impact one on the other ? > > Thanks a lot > > Minh > > *Envoyé:* mercredi 6 novembre 2024 à 16:13 > *De:* "Mark Payne" <[email protected]> > *À:* "[email protected]" <[email protected]> > *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space > OK so the decompress should be CPU intensive but not heap/memory > intensive. > EvaluateJsonPath will potentially consume large amounts of heap as well, > depending on how it’s configured. > The ExecuteGroovyScript sounds like it would use very little. > ReplaceText may well consume huge amounts of heap, depending on how it’s > configured. > > Can you share how EvaluteJsonPath and ReplaceText are configured? > > The idea that 16 GB of RAM is max recommended for a JVM was true a while > ago but with modern JVM’s you can go much higher. That said, given the flow > described, 4 GB should be more than sufficient if properly configured. > > Thanks > -Mark > > > > On Nov 6, 2024, at 9:51 AM, [email protected] wrote: > > Thanks for reply Mark, > > The groovy script is very simple : > > hexContent = flowFile.getAttribute('hexContent') > hexContent = hexContent.decodeHex() > outputStream.write(hexContent) > > The question is how is possible to process flowfiles as quickly as > possible. > If I upgrade the CPU to 8 per node, is it possible to process less > flowfiles at the same time but more flowfiles ? > > The main nifi dataflow is : > > - Uncompress incoming flowfiles (cpu/heap consume I suppose) > - ReplaceText (heap consume) > - EvaluateJsonPath (heap consume) > - ExecuteGroovyScript (heap consume) > > > I read that 16GB of RAM is the maximum recommended for a JVM and that > adding more isn’t beneficial. > Is that true, or can I increase it to 32GB? > > Regards > > Minh > *Envoyé:* mercredi 6 novembre 2024 à 15:24 > *De:* "Mark Payne" <[email protected]> > *À:* "[email protected]" <[email protected]> > *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space > Hi Minh, > > It is possible that the heap is being exhausted by EvaluateJsonPath if you > are using it to add large JSON chunks as attributes. For example, if you’re > creating an attribute from `$.` to put the entire JSON contents into > attributes. Generally, attributes should be kept pretty small. > > Otherwise, based on the flow described, the issue is almost certainly > within the ExecuteGroovyScript. There, there’s not much guidance we can > provide, as it’s running your own script. You’d need to understand what in > your own script is using up all of the heap. > > Thanks > -Mark > > > > On Nov 6, 2024, at 4:26 AM, [email protected] wrote: > > Hello all, > > We got a cluster with 10 nodes (4CPU/16Go) - NIFI 1.25 - jdk-11.0.19 > > We use this cluster to send the datas to GCP bucket, the datas are sent by > others clusters, so we do S2S betweens them. > > I can't determine where is the issue. This message could by raise > by EvaluateJsonPath/ExecuteGroovyScript/UpdateAttribute > We have around 100.000 flowfiles (160Go datas) > We need configure more than 1 tasks for each processor to run more faster > but we have always this error > > > <evaluateJsonPath.png><evaluateJsonPath2.png><out_of_memory.png> > <replaceText.png><replaceText2.png> > >
