Re: Mystery memory leak in fuseki

Dave Reynolds Fri, 07 Jul 2023 03:11:03 -0700

Hi Andy,

Thanks for looking.

Good thought on some issue with stacked requests causing thread leak butdon't think that matches our data.

From the metrics the number of threads and total thread memory used isnot that great and is stable long term while the process size grows, atleast in our situation.

This is based on both the JVM metrics from the prometheus scrape and byswitching on native memory checking and using jcmd to do various lowlevel dumps.

In a test set up we can replicate the long term (~3 hours) processgrowth (while the heap, non-heap and threads stay stable) by just doingsomething like:


watch -n 1 'curl -s http://localhost:3030/$/metrics'

With no other requests at all. So I think that makes it less likely theroot cause is triggered by stacked concurrent requests. Certainly thecurl process has exited completely each time. Though I guess there couldsome connection cleanup going on in the linux kernel still.


> Is the OOM kill the container runtime or Java exception?

We're not limiting the container memory but the OOM error is from dockerruntime itself:

    fatal error: out of memory allocating heap arena map

We have replicated the memory growth outside a container but not leftthat to soak on a small machine to provoke an OOM, so not sure if theOOM killer would hit first or get a java OOM exception first.

One curiosity we've found on the recent tests is that, when the processhas grown to dangerous level for the server, we do randomly sometimessee the JVM (Temurin 17.0.7) spit out a thread dump and heap summary asif there were a low level exception. However, there's no exceptionmessage at all - just a timestamp the thread dump and nothing else. TheJVM seems to just carry on and the process doesn't exit. We're notsetting any debug flags and not requesting any thread dump, and there'sno obvious triggering event. This is before the server gets completelyout of the memory causing the docker runtime to barf.


Dave


On 07/07/2023 09:56, Andy Seaborne wrote:

I tried running without any datasets. I get the same heap effect ofgrowing slowly then a dropping back.
Fuseki Main (fuseki-server did the same but the figures are from main -there is less going on)
Version 4.8.0

fuseki -v --ping --empty    # No datasets

4G heap.
71M allocated
4 threads (+ Daemon system threads)
2 are not parked (i.e. they are blocked)
The heap grows slowly to 48M then a GC runs then drops to 27M
This repeats.

Run one ping.
Heap now 142M, 94M/21M GC cycle
and 2 more threads at least for a while. They seem to go away after time.
2 are not parked.

Now pause process the JVM, queue 100 pings and continue the process.
Heap now 142M, 80M/21M GC cycle
and no more threads.

Thread stacks are not heap so there may be something here.

Same except -Xmx500M
RSS is 180M
Heap is 35M actual.
56M/13M heap cycle
and after one ping:
I saw 3 more threads, and one quickly exited.
2 are not parked

100 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
RSS is 273M

With -Xmx250M -Xss170k
The Fuseki command failed below 170k during classloading.

1000 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
The threads aren't being gathered.
RSS is 457M.

So a bit of speculation:

Is the OOM kill the container runtime or Java exception?

There aren't many moving parts.

Maybe under some circumstances, the metrics gatherer or ping caller
causes more threads. This could be bad timing, several operationsarriving at the same time, or it could be the client end isn't releasingthe HTTP connection in a timely manner or is delayed/failing to read theentire response. HTTP/1.1. -- HTTP/2 probably isn't at risk.
Together with a dataset, memory mapped files etc, it is pushing theprocess size up and on a small machine that might become a problemespecially if the container host is limiting RAM.
But speculation.

     Andy

Re: Mystery memory leak in fuseki

Reply via email to