Thanks for reply Mark,
The groovy script is very simple :
hexContent = flowFile.getAttribute('hexContent')
hexContent = hexContent.decodeHex()
outputStream.write(hexContent)
hexContent = hexContent.decodeHex()
outputStream.write(hexContent)
The question is how is possible to process flowfiles as quickly as possible.
If I upgrade the CPU to 8 per node, is it possible to process less flowfiles at the same time but more flowfiles ?
The main nifi dataflow is :
- Uncompress incoming flowfiles (cpu/heap consume I suppose)
- ReplaceText (heap consume)
- EvaluateJsonPath (heap consume)
- ExecuteGroovyScript (heap consume)
I read that 16GB of RAM is the maximum recommended for a JVM and that adding more isn’t beneficial.
Is that true, or can I increase it to 32GB?
Regards
Minh
Envoyé: mercredi 6 novembre 2024 à 15:24
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space
De: "Mark Payne" <[email protected]>
À: "[email protected]" <[email protected]>
Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space
Hi Minh,
It is possible that the heap is being exhausted by EvaluateJsonPath if you are using it to add large JSON chunks as attributes. For example, if you’re creating an attribute from `$.` to put the entire JSON contents into attributes. Generally, attributes should be kept pretty small.
Otherwise, based on the flow described, the issue is almost certainly within the ExecuteGroovyScript. There, there’s not much guidance we can provide, as it’s running your own script. You’d need to understand what in your own script is using up all of the heap.
Thanks
-Mark
On Nov 6, 2024, at 4:26 AM, [email protected] wrote:Hello all,We got a cluster with 10 nodes (4CPU/16Go) - NIFI 1.25 - jdk-11.0.19We use this cluster to send the datas to GCP bucket, the datas are sent by others clusters, so we do S2S betweens them.I can't determine where is the issue. This message could by raise by EvaluateJsonPath/ExecuteGroovyScript/UpdateAttributeWe have around 100.000 flowfiles (160Go datas)We need configure more than 1 tasks for each processor to run more faster but we have always this error
