Re: OOM Killed

Dave Reynolds Mon, 10 Jul 2023 10:46:47 -0700

Hi Andy,

On 10/07/2023 12:18, Andy Seaborne wrote:

Laura, Dave,


This doesn't sound like the same issue but let's see.


It may well be different, if so apologies for causing noise.

Dave - your situation isn't under high load is it?

We see the process size growth under no load other than metric scrapes.However, growth seems faster if there's more traffic (faster scrapes) soexpecting that high query load will make it worse. Whether "worse" meansit'll just get to some asymptote faster or actually go higher is unproven.

- Is it in a container? If so:

The original problem was in a container. But as I've said we canreproduce the process growth on bare metal (i.e. local desktop).

   Is it the container being killed OOM or
     Java throwing an OOM exception?

For the original problem it's the container being OOM killed, no javaexception.

For local tests both on container and bare metal we've just been lookingat the process size growth, haven't run it long enough to reach OOMstate on the size of machine I'm using then I doubt it will on atimescale I can wait for.

   Much RAM does the container get? How many threads?

For the original problem the container had no memory limit other thanmachine total of 4GB. No constraints set on threads.

- If not a container, how many CPU Threads are there? How many cores?

For local tests 6 cores, should mean 12 CPU threads but checking justnow I suspect hyperthreading isn't working on my current install so callit 6 of both.

- Which form of Fuseki are you using?


fuseki-server

what does
   java -XX:+PrintFlagsFinal -version \
    | grep -i 'M..HeapSize`

say?


E.g. in the container:

size_t ErgoHeapSizeLimit = 0{product} {default}size_t HeapSizePerGCThread = 43620760{product} {default}size_t InitialHeapSize = 65011712{product} {ergonomic}size_t LargePageHeapSizeThreshold = 134217728{product} {default}size_t MaxHeapSize = 1019215872{product} {ergonomic}size_t MinHeapSize = 8388608{product} {ergonomic}uintx NonNMethodCodeHeapSize = 5826188{pd product} {ergonomic}uintx NonProfiledCodeHeapSize = 122916026{pd product} {ergonomic}uintx ProfiledCodeHeapSize = 122916026{pd product} {ergonomic}size_t SoftMaxHeapSize = 1019215872{manageable} {ergonomic}

But we are overriding the MaxHeapSize with the -Xmx flag in the actualrunning process.

How are you sending the queries to the server?

For the original problems this occurred on a system with no queries atall just the metrics scraping. The /$/ping point was getting checked bya healthcheck monitoring tool (sensu) and /$/metrics by prometheus.

On the local bare metal checks where we can reproduce process growth atleast medium term we are just checking those by curl in a watch loop.

We've made some process at our end, and update on that back on theprevious thread rather than further confuse this one.


Dave

On 09/07/2023 20:33, Laura Morales wrote:
I'm running a job that is submitting a lot of queries to a Fusekiserver, in parallel. My problem is that Fuseki is OOM-killed and Idon't know how to fix this. Some details:
- Fuseki is queried as fast as possible. Queries take around 50-100msto complete so I think it's serving 10s of queries each second
Are all the queries about the same amount of work are are some going tocause significantly more memory use?
It is quite possible to send queries faster than the server can processthem - there is little point sending in parallel more than there arereal CPU threads to service them.
They will interfere and the machine can end up going slower (query ofqueries per second).
I don't know exactly the impact on the GC but I think the JVM delaysminor GC's when very busy but that pushes it to do major ones earlier.
A thing to try is use less parallelism.
- Fuseki 4.8. OS is Debian 12 (minimal installation with only OS,Fuseki, no desktop environments, uses only ~100MB of RAM)- all the queries are read queries. No updates, inserts, or otherwrite queries
- all the queries are over HTTP to the Fuseki endpoint
- database is TDB2 (created with tdb2.tdbloader)
- database contains around 2.5M triples
- the machine has 8GB RAM. I've tried on another PC with 16GB and itcompletes the job. On 8GB though, it won't- with -Xmx6G it's killed earlier. With -Xmx2G it's killed later.Either way it's always killed.
Is it getting OOM at random or do certain queries tend to push it overhe edge?
Is that the machine (container) has 8G RAM and there is no -Xmx setting?in that case, default setting applies which is 25% of RAM.
A heap dump to know where the memory is going would be useful.
Is there anything that I can tweak to avoid Fuseki getting killed?Something that isn't "just buy more RAM".
Thank you

Re: OOM Killed

Reply via email to