On 27/07/2021 22:17, Marco Fiocco wrote:
Ok let me clarify the steps.
I start Fuseki as a docker container with the config you saw earlier.
Then I load the dataset with curl. After that I intend to use Fuseki as
"read only" query only.
That at moment, there is absolutely no query being done, but still the
memory allocated keeps growing.

Does the Fuseki log show finished requests?


I've noticed 2 interesting things though:
Fuseki starts with Java options "-Xmx2048m -Xms2048m" and I have now
reserved 3GB of RAM for the service.
Now
- If wait enough, the memory keeps growing but when it reaches about 2.4GB
it is suddenly deallocated back to starting size when the dataset was
loaded (about 700MB)
- if I access the Fuseki /ds endpoint to download the data, THAT also
deallocates the memory back to 700MB.

Is this normal Java behaviour?

Yes.

As queries happen, the heap will grow, There is some reclamation in incremental GCs, which are very quick and happening much of the time, but these GC cycles do not collect all unused space. A full GC can do that.

Th JDK runtime lets the heap become full with space that didn't get reclaimed by an incremental GC, then it trigger a full GC. At that point, the in-use space drops right back to the in-use space which ree is the data of the in-memory dataset.

Very roughly, Java takes about 0.5G over the heap size so 2.4GB is what to expect before the runtime triggers a full GC and the full GC will reduce the heap in uses. The process size will not shrink.

TDB1 itself isn't perfect at releasing unused data if deletion occurs but your description doesn't include any deletes.

the process gets OOM killed every 2 hours

Is that an exception of some OS control killing the process because it exceeds some configuration limit? (ulimit, VM or conatiner provision, ....) Is that what you mean by "now reserved 3GB of RAM for the service." because f it was 2G, that limit will be hit.

The OS process will grow to more than 2G, the heap isn't the only use of space in Java.

Do you get an OutOfMemoryException (OOME) from Java or a the OS (etc) says it is too big?

An OOME happens when the full GC does not release memory to fulfil a request for space from Java.

Capturing state at that point (Jerven's -XX:+HeapDumpOnOutOfMemoryError suggestion) would help identify why. But is there a request in progress at the time (see the Fuseki log)?

    Andy

On Tue, 27 Jul 2021 at 20:34, Andy Seaborne <[email protected]> wrote:

If the dataset is read-only, then it is always empty. The
fuseki:serviceReadWriteGraphStore is the only way to get data into the
database. The other two services are read-only.

Are you sure it is the database growing and not Java loading classes? On
a tight memory footprint, and because classes are loaded on-demand,
there are other sources of RAM usage.

Also - The heap will grow until it hits the heap size. Java does not
call a full garbage collection until it needs to so sending SHACL
requests, or read-only queries, for example, will grow the heap and not
all the space is reclaimed until a full GC is done (Laura - this relates
to heap size < real RAM size and never swap).

      Andy

On 27/07/2021 18:15, Marco Fiocco wrote:
On Tue, 27 Jul 2021 at 18:04, Andy Seaborne <[email protected]> wrote:



On 27/07/2021 14:19, Marco Fiocco wrote:
Hello,

I'm running a in-memory Fuseki 3.16 server and I see that the allocated
memory keeps growing linearly indefinitely even if idle.

That is strange because if there are no requests, it does no work.

Initially I reserved 1GB of memory and I've noticed that the process
gets OOM killed every 2 hours.

What pattern of usage is it getting?


Actually it's used as read-only. But the memory grows even if there is no
request.


    > Now I've allocated 2GB because I've read somewhere that 2GB is the
minimum for Java heaps. Is that true?

It's not that simple - you have an in-memory dataset so the space needed
is proportional to the amount of data.


At the moment the the initial memory (with the dataset loaded with the
REST
API) is around 600-700MB.
  From that it grows by itself...


I'm waiting to see if it will get again.
Is this a bug or there is a better way to config it?

If you don't need the union graph, "rdf:type ja:MemoryDataset" is a
better in-memory choice. It has a smaller foot print and (I guess in
your setup you delete data as well as add it?) managed DELETE and PUT
better for GSP.  TDB, in-memory is primarily a testing configuration.


Would the memory be lower if instead of in-memory we use on disk TDB or
TDB2?

Thanks



My Fuseki config is:

@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix :        <#> .

[] rdf:type fuseki:Server .

<#service> rdf:type fuseki:Service ;
       rdfs:label          "Dataset with SHACL validation" ;
       fuseki:name         "ds" ;
                                                                       #
See
the endpoint url in build.gradle
       fuseki:serviceReadWriteGraphStore "data" ;
                                                               # SPARQL
Graph
store protocol (read and write)
       fuseki:endpoint  [ fuseki:operation fuseki:query ;
   fuseki:name "sparql"  ] ;       # SPARQL query service
       fuseki:endpoint  [ fuseki:operation fuseki:shacl ;
   fuseki:name "shacl" ] ;         # SHACL query service
       fuseki:dataset      <#dataset> .

## In memory TDB with union graph.
<#dataset> rdf:type   tdb:DatasetTDB ;
     tdb:location "--mem--" ;
     # Query timeout on this dataset (1s, 1000 milliseconds)
     ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ;
     # Make the default graph be the union of all named graphs.
     tdb:unionDefaultGraph true .

Thanks
Marco





Reply via email to