Thanks of lot !
You make my day with these videos.
[3] if the "run schedule" = 0 seconds, we don't need to change the "run Duration" value, right ?
Thanks
Minh
Envoyé: jeudi 7 novembre 2024 à 18:12
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space
OK so given that, the issue is almost certainly because you’re promoting huge chunks of JSON into attributes using EvaluateJsonPath.
You’ll want to avoid putting anything larger than a few hundred characters into attributes. Instead, lean into using Record-based processors
In order to manipulate the contents of the FlowFiles as they are, without creating attributes from content. EvaluateJsonPath is helpful for creating attributes on small JSON Fields so that you can perform routing, etc. but should not be used to create large attributes. [1]
I also see in your canvas that you have several load-balanced connections, which you should avoid [2].
Re: the relationship between “Run Schedule” and “Run Duration” - Run Schedule indicates how long to wait between triggering the Processor. Run Duration says how long to run the Processor each time it’s scheduled to run. So if Run Schedule = 5 seconds and Run Duration = 2 seconds, then the Processor will run for up to 2 seconds. Then it will not run again for 5 seconds. Then it will run for 2 seconds. Then it will do nothing for 5 seconds. In practice, Processors should almost always have a Run Schedule of 0 seconds except for source processors. See [3] for more details.
Thanks
-Mark
On Nov 7, 2024, at 3:49 AM, [email protected] wrote:<evaluateJsonPath.png><evaluateJsonPath2.png><out_of_memory.png><replaceText.png><replaceText2.png>Here the configuration for EvaluteJsonPath and ReplaceTextAnother question about "Run Schedule" and "Run Duration"In separately feature I know how each of them is working but how they do to work together ?
I mean, if "Run Schedule" is setup to 0s and "Run Duration" is setup to 2s.It means the processor always running ?How does the impact one on the other ?Thanks a lotMinhEnvoyé: mercredi 6 novembre 2024 à 16:13
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap spaceOK so the decompress should be CPU intensive but not heap/memory intensive.EvaluateJsonPath will potentially consume large amounts of heap as well, depending on how it’s configured.The ExecuteGroovyScript sounds like it would use very little.ReplaceText may well consume huge amounts of heap, depending on how it’s configured.Can you share how EvaluteJsonPath and ReplaceText are configured?The idea that 16 GB of RAM is max recommended for a JVM was true a while ago but with modern JVM’s you can go much higher. That said, given the flow described, 4 GB should be more than sufficient if properly configured.Thanks-MarkOn Nov 6, 2024, at 9:51 AM, [email protected] wrote:Thanks for reply Mark,The groovy script is very simple :hexContent = flowFile.getAttribute('hexContent')
hexContent = hexContent.decodeHex()
outputStream.write(hexContent)The question is how is possible to process flowfiles as quickly as possible.If I upgrade the CPU to 8 per node, is it possible to process less flowfiles at the same time but more flowfiles ?The main nifi dataflow is :
- Uncompress incoming flowfiles (cpu/heap consume I suppose)
- ReplaceText (heap consume)
- EvaluateJsonPath (heap consume)
- ExecuteGroovyScript (heap consume)
I read that 16GB of RAM is the maximum recommended for a JVM and that adding more isn’t beneficial.
Is that true, or can I increase it to 32GB?
Regards
MinhEnvoyé: mercredi 6 novembre 2024 à 15:24
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap spaceHi Minh,It is possible that the heap is being exhausted by EvaluateJsonPath if you are using it to add large JSON chunks as attributes. For example, if you’re creating an attribute from `$.` to put the entire JSON contents into attributes. Generally, attributes should be kept pretty small.Otherwise, based on the flow described, the issue is almost certainly within the ExecuteGroovyScript. There, there’s not much guidance we can provide, as it’s running your own script. You’d need to understand what in your own script is using up all of the heap.Thanks-MarkOn Nov 6, 2024, at 4:26 AM, [email protected] wrote:Hello all,We got a cluster with 10 nodes (4CPU/16Go) - NIFI 1.25 - jdk-11.0.19We use this cluster to send the datas to GCP bucket, the datas are sent by others clusters, so we do S2S betweens them.I can't determine where is the issue. This message could by raise by EvaluateJsonPath/ExecuteGroovyScript/UpdateAttributeWe have around 100.000 flowfiles (160Go datas)We need configure more than 1 tasks for each processor to run more faster but we have always this error
