I have tried to do some testing but I cannot get a definitive answer about what
works and what not, because there are so many variables. Also it doesn't fail
right away, but after 1h to 1.5h, so I've done fewer than a dozen tests
actually. I'm sorry that I can't be more precise.
The two PCs that I've got are an i3 4th-gen 2C4T with 8GB RAM, and an i7
4th-gen 2C4T with 16GB RAM. The database is stored in a 1TB USB3.0 SSD (which I
move between the two PCs). Either way, the only component that seems to make a
difference is the amount of RAM. They are physical machines (bare metal) and
not containers.
I'm running "fuseki-server" binary, downloaded from the binaries distribution,
"./fuseki-server --loc=database --port=7000 --localhost /query".
I query Fuseki (4.8) over HTTP from a script that runs on the same PC. The
script uses <100MB of RAM. All queries are "read" (no writes) and actually very
basic. Either "DESCRIBE <node>" or "SELECT" that selects a node and follows a
couple of links 3 or 4 levels deep at most. Fuseki answers them very quickly,
<5ms, but occasionally it takes 50-100ms or rarely a couple of seconds
(probably because of garbage collection?). The queries are always the same, run
over and over across the dataset. The dataset contains approximately 200K
nodes, 2.5M triples, 4GB disk size. Most nodes have, among their properties, 3
that contain long strings, approximately 20KB-50KB combined, per node. I don't
query those properties directly, but when I "DESCRIBE" a node they are
retrieved. Fuseki is very fast, but these strings may contribute to the high
memory load (I'm only guessing). I cannot identify any particular query as the
offender, since it's the same bunch of queries run over and over at max speed
(ie. one after the other with no wait time). It's pretty much the same amount
of work for every query, and it works perfectly fine until it consumes all the
RAM and SWAP.
The only configuration that worked for me (ie. it completed the job) is -Xmx4G
and no parallelism at all (one request after the other in series) on 16GB of
RAM (it used up all the RAM available). It seems strange to me that it needs so
much RAM even when all the requests are serialized. Querying in parallel with
16 or 32 threads doesn't seem to make much of a difference to Fuseki, other
than the even higher memory consumption (Fuseki answers all queries very
quickly in milliseconds, until it runs out of memory).
The memory growth is not instantaneous, and is not linear. I can see RAM usage
fluctuate by a 3-4GB range (for example between 3GB and 7GB). But the trend is
to use more and more memory. For example before crashing it would fluctuate
between 10GB and 15GB.
If I increase -Xmx over 4GB, Fuseki is eventually OOM killed by the kernel.
Below 4GB, Fuseki crashes with a heap error like this (both cases fail well
after 1h of work):
10:03:21 WARN QueuedThreadPool :: Job failed
java.lang.OutOfMemoryError: Java heap space
10:03:21 WARN Fuseki :: [152378] RC = 500 : Java heap space: failed
reallocation of scalar replaced objects
java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar
replaced objects
10:03:21 INFO Fuseki :: [152378] 500 Server Error (48.115 s)
10:03:23 WARN AbstractConnector :: Accept Failure
java.lang.OutOfMemoryError: Java heap space
10:04:08 WARN QueuedThreadPool :: Job failed
java.lang.OutOfMemoryError: Java heap space
10:04:08 WARN QueuedThreadPool :: Job failed
java.lang.OutOfMemoryError: Java heap space
Exception in thread "HttpClient-2-SelectorManager" java.lang.OutOfMemoryError:
Java heap space
On 8GB RAM it always fails for me, -Xmx4G or more is OOM-killed, whereas less
ends up with a heap error.
The output of "java -XX:+PrintFlagsFinal -version | grep -i "M..HeapSize"" is
size_t MaxHeapSize = 4175429632
{product} {ergonomic}
size_t ShenandoahSoftMaxHeapSize = 0
{manageable} {default}
openjdk version "11.0.18" 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode,
sharing)
I've also tried with OpenJDK 17, same results.
I tried Fuseki 3.17 too but I was getting other JSON-LD errors (probably
related to an old JSON-LD library) so I didn't test further.
I know that I don't have the latest and greatest hardware, but I think my
database is very small and I feel like Fuseki should not be using 16GB RAM when
running a lot of simple queries in series (not in parallel).
One thing that I want to try, but so far haven't, is to restart Fuseki halfway
through the job.
> Sent: Monday, July 10, 2023 at 1:18 PM
> From: "Andy Seaborne" <[email protected]>
> To: [email protected]
> Subject: Re: OOM Killed
>
> Laura, Dave,
>
> This doesn't sound like the same issue but let's see.
>
> Dave - your situation isn't under high load is it?
>
> - Is it in a container? If so:
> Is it the container being killed OOM or
> Java throwing an OOM exception?
> Much RAM does the container get? How many threads?
>
> - If not a container, how many CPU Threads are there? How many cores?
>
> - Which form of Fuseki are you using?
>
> what does
> java -XX:+PrintFlagsFinal -version \
> | grep -i 'M..HeapSize`
>
> say?
>
> How are you sending the queries to the server?
>
> On 09/07/2023 20:33, Laura Morales wrote:
> > I'm running a job that is submitting a lot of queries to a Fuseki server,
> > in parallel. My problem is that Fuseki is OOM-killed and I don't know how
> > to fix this. Some details:
> >
> > - Fuseki is queried as fast as possible. Queries take around 50-100ms to
> > complete so I think it's serving 10s of queries each second
>
> Are all the queries about the same amount of work are are some going to
> cause significantly more memory use?
>
> It is quite possible to send queries faster than the server can process
> them - there is little point sending in parallel more than there are
> real CPU threads to service them.
>
> They will interfere and the machine can end up going slower (query of
> queries per second).
>
> I don't know exactly the impact on the GC but I think the JVM delays
> minor GC's when very busy but that pushes it to do major ones earlier.
>
> A thing to try is use less parallelism.
>
> > - Fuseki 4.8. OS is Debian 12 (minimal installation with only OS, Fuseki,
> > no desktop environments, uses only ~100MB of RAM)
> > - all the queries are read queries. No updates, inserts, or other write
> > queries
> > - all the queries are over HTTP to the Fuseki endpoint
> > - database is TDB2 (created with tdb2.tdbloader)
> > - database contains around 2.5M triples
> > - the machine has 8GB RAM. I've tried on another PC with 16GB and it
> > completes the job. On 8GB though, it won't
> > - with -Xmx6G it's killed earlier. With -Xmx2G it's killed later. Either
> > way it's always killed.
>
> Is it getting OOM at random or do certain queries tend to push it over
> he edge?
>
> Is that the machine (container) has 8G RAM and there is no -Xmx setting?
> in that case, default setting applies which is 25% of RAM.
>
> A heap dump to know where the memory is going would be useful.
>
> > Is there anything that I can tweak to avoid Fuseki getting killed?
> > Something that isn't "just buy more RAM".
> > Thank you
>