Hi Andy,
On 10/07/2023 12:18, Andy Seaborne wrote:
Laura, Dave,
This doesn't sound like the same issue but let's see.
It may well be different, if so apologies for causing noise.
Dave - your situation isn't under high load is it?
We see the process size growth under no load other than metric scrapes.
However, growth seems faster if there's more traffic (faster scrapes) so
expecting that high query load will make it worse. Whether "worse" means
it'll just get to some asymptote faster or actually go higher is unproven.
- Is it in a container? If so:
The original problem was in a container. But as I've said we can
reproduce the process growth on bare metal (i.e. local desktop).
Is it the container being killed OOM or
Java throwing an OOM exception?
For the original problem it's the container being OOM killed, no java
exception.
For local tests both on container and bare metal we've just been looking
at the process size growth, haven't run it long enough to reach OOM
state on the size of machine I'm using then I doubt it will on a
timescale I can wait for.
Much RAM does the container get? How many threads?
For the original problem the container had no memory limit other than
machine total of 4GB. No constraints set on threads.
- If not a container, how many CPU Threads are there? How many cores?
For local tests 6 cores, should mean 12 CPU threads but checking just
now I suspect hyperthreading isn't working on my current install so call
it 6 of both.
- Which form of Fuseki are you using?
fuseki-server
what does
java -XX:+PrintFlagsFinal -version \
| grep -i 'M..HeapSize`
say?
E.g. in the container:
size_t ErgoHeapSizeLimit = 0
{product} {default}
size_t HeapSizePerGCThread = 43620760
{product} {default}
size_t InitialHeapSize = 65011712
{product} {ergonomic}
size_t LargePageHeapSizeThreshold = 134217728
{product} {default}
size_t MaxHeapSize = 1019215872
{product} {ergonomic}
size_t MinHeapSize = 8388608
{product} {ergonomic}
uintx NonNMethodCodeHeapSize = 5826188
{pd product} {ergonomic}
uintx NonProfiledCodeHeapSize = 122916026
{pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122916026
{pd product} {ergonomic}
size_t SoftMaxHeapSize = 1019215872
{manageable} {ergonomic}
But we are overriding the MaxHeapSize with the -Xmx flag in the actual
running process.
How are you sending the queries to the server?
For the original problems this occurred on a system with no queries at
all just the metrics scraping. The /$/ping point was getting checked by
a healthcheck monitoring tool (sensu) and /$/metrics by prometheus.
On the local bare metal checks where we can reproduce process growth at
least medium term we are just checking those by curl in a watch loop.
We've made some process at our end, and update on that back on the
previous thread rather than further confuse this one.
Dave
On 09/07/2023 20:33, Laura Morales wrote:
I'm running a job that is submitting a lot of queries to a Fuseki
server, in parallel. My problem is that Fuseki is OOM-killed and I
don't know how to fix this. Some details:
- Fuseki is queried as fast as possible. Queries take around 50-100ms
to complete so I think it's serving 10s of queries each second
Are all the queries about the same amount of work are are some going to
cause significantly more memory use?
It is quite possible to send queries faster than the server can process
them - there is little point sending in parallel more than there are
real CPU threads to service them.
They will interfere and the machine can end up going slower (query of
queries per second).
I don't know exactly the impact on the GC but I think the JVM delays
minor GC's when very busy but that pushes it to do major ones earlier.
A thing to try is use less parallelism.
- Fuseki 4.8. OS is Debian 12 (minimal installation with only OS,
Fuseki, no desktop environments, uses only ~100MB of RAM)
- all the queries are read queries. No updates, inserts, or other
write queries
- all the queries are over HTTP to the Fuseki endpoint
- database is TDB2 (created with tdb2.tdbloader)
- database contains around 2.5M triples
- the machine has 8GB RAM. I've tried on another PC with 16GB and it
completes the job. On 8GB though, it won't
- with -Xmx6G it's killed earlier. With -Xmx2G it's killed later.
Either way it's always killed.
Is it getting OOM at random or do certain queries tend to push it over
he edge?
Is that the machine (container) has 8G RAM and there is no -Xmx setting?
in that case, default setting applies which is 25% of RAM.
A heap dump to know where the memory is going would be useful.
Is there anything that I can tweak to avoid Fuseki getting killed?
Something that isn't "just buy more RAM".
Thank you