Minh,

ReplaceText uses a lot of memory if you set “Evaluation Mode” to “Entire Text” 
because it does the same thing as your groovy script - reads the entire thing 
into memory, and then evaluates it.  But if you set it to “Line-by-Line” it 
evaluates each line of text, one line at a time, and it is very efficient 
(assuming your input is not a huge 189 MB JSON with no newlines or something 
like that).

Thanks
-Mark


On Nov 14, 2024, at 12:30 PM, e-soci...@gmx.fr wrote:


I'm a bit lost. In the previous email, somebody told to me to avoir to use 
ReplaceText because it consume a lot of memory.

In fact, in my use case, I receive a lot of files, sometimes files contains 
50000 lines, and the size is around 189Mo.
So I have already used ReplaceText but got the same issue with OutOfMemory.

It seems to be anti-pattern to use SplitText, ReplaceText, MergeContent.
But perhaps, it is the only solution for my case.

Regards

Envoyé: jeudi 14 novembre 2024 à 17:12
De: "Mark Payne" <marka...@hotmail.com>
À: "users@nifi.apache.org" <users@nifi.apache.org>
Cc: "users@nifi.apache.org" <users@nifi.apache.org>
Objet: Re: Java heap space: java.lang.OutOfMemoryError: Java heap space
Minh,

It looks like you’re simply using a regex to modify contents of the file. 
Recommend you take a look at ReplaceText and avoid groovy all together.

Thanks
Mark


Sent from my iPhone

On Nov 14, 2024, at 10:42 AM, e-soci...@gmx.fr wrote:


Hello,

Yes, I probably need to write groovy code which read line by line and replace 
the contents before write back.

If any expert could help me, it is appreciate :)

Minh


Envoyé: jeudi 14 novembre 2024 à 15:22
De: "Joe Witt" <joe.w...@gmail.com>
À: users@nifi.apache.org
Objet: Re: Java heap space: java.lang.OutOfMemoryError: Java heap space
Hello

The code shown is very simple but not memory efficient.

The first call IOUtils.toString takes a 'stream' and converts it into a full 
java String object in memory.  So in a simple sense if the input is 190MB then 
you have at least that in the Java Heap.  Then again the call text 
text.GetBytes would do the same even if temporarily.

You want to do these changes in batches or using mechanisms that allow it to 
happen in streaming fashion.  There are a lot of parts of NiFi that do such 
things to ensure efficient memory usage.

Others more familiar with Groovy/etc.. can certainly provide pointers.

Thanks

On Thu, Nov 14, 2024 at 7:03 AM <e-soci...@gmx.fr<mailto:e-soci...@gmx.fr>> 
wrote:
Hello all,

Why I got out of memory during processing the file with ExecuteGroovyScript?

The size of file could be reach a maxsize to 190Mo

Error : ExecuteGroovyScript[id=07e5314d-b20a-1076-882b-54b44baca66d] 
java.lang.OutOfMemoryError: Java heap space: java.lang.OutOfMemoryError: Java 
heap space

The groovy is very simple :

        // get data from flowfile
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)

        // add timestamp in each line in the text/flowfile
        text = text.replaceAll(/^/, "\"timestamp\":\"$timestamp\",")

        // write back to the flowfile
        outputStream.write(text.getBytes(StandardCharsets.UTF_8))

Could you help me to process if it is not the good way to do this thing ?

Thanks

Minh





Reply via email to