Hi Marco,
What you are seeing is normal behavior with the default GC options.
There are other GC/JVM combinations that do not act that way.
In any case if you want to know if it is a "MemoryLeak" or just normal
GC behaviour I can recommend the java option:
-XX:+HeapDumpOnOutOfMemoryError
Which will write a file that can be inspected (e.g. with Eclipse Memory
Analyzer) to see what is using the memory at the point of running out.
An other option is to use a GC algorithm such as ShenandoahGC on Java 11
or a newer JVM if using G1GC. ShenandoahGC will return clear and return
ram much more eagerly than the default algorithm in java version 15 and
before.
Regards,
Jerven
On 27/07/2021 23:17, Marco Fiocco wrote:
Ok let me clarify the steps.
I start Fuseki as a docker container with the config you saw earlier.
Then I load the dataset with curl. After that I intend to use Fuseki as
"read only" query only.
That at moment, there is absolutely no query being done, but still the
memory allocated keeps growing.
I've noticed 2 interesting things though:
Fuseki starts with Java options "-Xmx2048m -Xms2048m" and I have now
reserved 3GB of RAM for the service.
Now
- If wait enough, the memory keeps growing but when it reaches about 2.4GB
it is suddenly deallocated back to starting size when the dataset was
loaded (about 700MB)
- if I access the Fuseki /ds endpoint to download the data, THAT also
deallocates the memory back to 700MB.
Is this normal Java behaviour?
On Tue, 27 Jul 2021 at 20:34, Andy Seaborne <[email protected]> wrote:
If the dataset is read-only, then it is always empty. The
fuseki:serviceReadWriteGraphStore is the only way to get data into the
database. The other two services are read-only.
Are you sure it is the database growing and not Java loading classes? On
a tight memory footprint, and because classes are loaded on-demand,
there are other sources of RAM usage.
Also - The heap will grow until it hits the heap size. Java does not
call a full garbage collection until it needs to so sending SHACL
requests, or read-only queries, for example, will grow the heap and not
all the space is reclaimed until a full GC is done (Laura - this relates
to heap size < real RAM size and never swap).
Andy
On 27/07/2021 18:15, Marco Fiocco wrote:
On Tue, 27 Jul 2021 at 18:04, Andy Seaborne <[email protected]> wrote:
On 27/07/2021 14:19, Marco Fiocco wrote:
Hello,
I'm running a in-memory Fuseki 3.16 server and I see that the allocated
memory keeps growing linearly indefinitely even if idle.
That is strange because if there are no requests, it does no work.
Initially I reserved 1GB of memory and I've noticed that the process
gets OOM killed every 2 hours.
What pattern of usage is it getting?
Actually it's used as read-only. But the memory grows even if there is no
request.
> Now I've allocated 2GB because I've read somewhere that 2GB is the
minimum for Java heaps. Is that true?
It's not that simple - you have an in-memory dataset so the space needed
is proportional to the amount of data.
At the moment the the initial memory (with the dataset loaded with the
REST
API) is around 600-700MB.
From that it grows by itself...
I'm waiting to see if it will get again.
Is this a bug or there is a better way to config it?
If you don't need the union graph, "rdf:type ja:MemoryDataset" is a
better in-memory choice. It has a smaller foot print and (I guess in
your setup you delete data as well as add it?) managed DELETE and PUT
better for GSP. TDB, in-memory is primarily a testing configuration.
Would the memory be lower if instead of in-memory we use on disk TDB or
TDB2?
Thanks
My Fuseki config is:
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix : <#> .
[] rdf:type fuseki:Server .
<#service> rdf:type fuseki:Service ;
rdfs:label "Dataset with SHACL validation" ;
fuseki:name "ds" ;
#
See
the endpoint url in build.gradle
fuseki:serviceReadWriteGraphStore "data" ;
# SPARQL
Graph
store protocol (read and write)
fuseki:endpoint [ fuseki:operation fuseki:query ;
fuseki:name "sparql" ] ; # SPARQL query service
fuseki:endpoint [ fuseki:operation fuseki:shacl ;
fuseki:name "shacl" ] ; # SHACL query service
fuseki:dataset <#dataset> .
## In memory TDB with union graph.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "--mem--" ;
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ;
# Make the default graph be the union of all named graphs.
tdb:unionDefaultGraph true .
Thanks
Marco
--
*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
[email protected] - www.sib.swiss