Re: Caused by: java.lang.OutOfMemoryError: Java heap space

Pierre Villard Fri, 08 Nov 2024 01:57:40 -0800

For very high performance flows it might be useful to have run schedule =
0s and run duration = 25ms. This basically tells the framework to take as
many flow files as possible for 25ms when the processor gets a thread to
run. Longer run duration is usually not proving useful as you'd block
threads for a long period of time and impact other processors in the flow.


Le ven. 8 nov. 2024 à 10:32, <[email protected]> a écrit :

> Thanks of lot !
>
> You make my day with these videos.
>
> [3] if the "run schedule" = 0 seconds, we don't need to change the "run
> Duration" value, right ?
>
> Thanks
>
> Minh
>
>
> *Envoyé:* jeudi 7 novembre 2024 à 18:12
> *De:* "Mark Payne" <[email protected]>
> *À:* "[email protected]" <[email protected]>
> *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space
> OK so given that, the issue is almost certainly because you’re promoting
> huge chunks of JSON into attributes using EvaluateJsonPath.
> You’ll want to avoid putting anything larger than a few hundred characters
> into attributes. Instead, lean into using Record-based processors
> In order to manipulate the contents of the FlowFiles as they are, without
> creating attributes from content. EvaluateJsonPath is helpful for creating
> attributes on small JSON Fields so that you can perform routing, etc. but
> should not be used to create large attributes. [1]
>
> I also see in your canvas that you have several load-balanced connections,
> which you should avoid [2].
>
> Re: the relationship between “Run Schedule” and “Run Duration” - Run
> Schedule indicates how long to wait between triggering  the Processor. Run
> Duration says how long to run the Processor each time it’s scheduled to
> run. So if Run Schedule = 5 seconds and Run Duration = 2 seconds, then  the
> Processor will run for up to 2 seconds. Then it will not run again for 5
> seconds. Then it will run for 2 seconds. Then it will do nothing for 5
> seconds. In practice, Processors should almost always have a Run Schedule
> of 0 seconds except for source processors. See [3] for more details.
>
> Thanks
> -Mark
>
> [1] https://www.youtube.com/watch?v=RjWstt7nRVY&t=187
> [2] https://www.youtube.com/watch?v=by9P0Zi8Dk8
> [3] https://www.youtube.com/watch?v=pZq0EbfDBy4
>
>
>
>
> On Nov 7, 2024, at 3:49 AM, [email protected] wrote:
>
> Here the configuration for EvaluteJsonPath and ReplaceText
>
> Another question about "Run Schedule" and "Run Duration"
> In separately feature I know how each of them is working but how they do
> to work together ?
>
> I mean, if "Run Schedule" is setup to 0s and "Run Duration" is setup to
> 2s.
> It means the processor always running ?
> How does the impact one on the other ?
>
> Thanks a lot
>
> Minh
>
> *Envoyé:* mercredi 6 novembre 2024 à 16:13
> *De:* "Mark Payne" <[email protected]>
> *À:* "[email protected]" <[email protected]>
> *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space
> OK so the decompress should be CPU intensive but not heap/memory
> intensive.
> EvaluateJsonPath will potentially consume large amounts of heap as well,
> depending on how it’s configured.
> The ExecuteGroovyScript sounds like it would use very little.
> ReplaceText may well consume huge amounts of heap, depending on how it’s
> configured.
>
> Can you share how EvaluteJsonPath and ReplaceText are configured?
>
> The idea that 16 GB of RAM is max recommended for a JVM was true a while
> ago but with modern JVM’s you can go much higher. That said, given the flow
> described, 4 GB should be more than sufficient if properly configured.
>
> Thanks
> -Mark
>
>
>
> On Nov 6, 2024, at 9:51 AM, [email protected] wrote:
>
> Thanks for reply Mark,
>
> The groovy script is very simple :
>
>         hexContent = flowFile.getAttribute('hexContent')
>         hexContent = hexContent.decodeHex()
>         outputStream.write(hexContent)
>
> The question is how is possible to process flowfiles as quickly as
> possible.
> If I upgrade the CPU to 8 per node, is it possible to process less
> flowfiles at the same time but more flowfiles ?
>
> The main nifi dataflow is :
>
>    - Uncompress incoming flowfiles (cpu/heap consume I suppose)
>    - ReplaceText (heap consume)
>    - EvaluateJsonPath (heap consume)
>    - ExecuteGroovyScript (heap consume)
>
>
> I read that 16GB of RAM is the maximum recommended for a JVM and that
> adding more isn’t beneficial.
> Is that true, or can I increase it to 32GB?
>
> Regards
>
> Minh
> *Envoyé:* mercredi 6 novembre 2024 à 15:24
> *De:* "Mark Payne" <[email protected]>
> *À:* "[email protected]" <[email protected]>
> *Objet:* Re: Caused by: java.lang.OutOfMemoryError: Java heap space
> Hi Minh,
>
> It is possible that the heap is being exhausted by EvaluateJsonPath if you
> are using it to add large JSON chunks as attributes. For example, if you’re
> creating an attribute from `$.` to put the entire JSON contents into
> attributes. Generally, attributes should be kept pretty small.
>
> Otherwise, based on the flow described, the issue is almost certainly
> within the ExecuteGroovyScript. There, there’s not much guidance we can
> provide, as it’s running your own script. You’d need to understand what in
> your own script is using up all of the heap.
>
> Thanks
> -Mark
>
>
>
> On Nov 6, 2024, at 4:26 AM, [email protected] wrote:
>
> Hello all,
>
> We got a cluster with 10 nodes (4CPU/16Go) - NIFI 1.25 - jdk-11.0.19
>
> We use this cluster to send the datas to GCP bucket, the datas are sent by
> others clusters, so we do S2S betweens them.
>
> I can't determine where is the issue. This message could by raise
> by EvaluateJsonPath/ExecuteGroovyScript/UpdateAttribute
> We have around 100.000 flowfiles (160Go datas)
> We need configure more than 1 tasks for each processor to run more faster
> but we have always this error
>
>
> <evaluateJsonPath.png><evaluateJsonPath2.png><out_of_memory.png>
> <replaceText.png><replaceText2.png>
>
>

Re: Caused by: java.lang.OutOfMemoryError: Java heap space

Reply via email to