Ok let me clarify the steps. I start Fuseki as a docker container with the config you saw earlier. Then I load the dataset with curl. After that I intend to use Fuseki as "read only" query only. That at moment, there is absolutely no query being done, but still the memory allocated keeps growing.
I've noticed 2 interesting things though: Fuseki starts with Java options "-Xmx2048m -Xms2048m" and I have now reserved 3GB of RAM for the service. Now - If wait enough, the memory keeps growing but when it reaches about 2.4GB it is suddenly deallocated back to starting size when the dataset was loaded (about 700MB) - if I access the Fuseki /ds endpoint to download the data, THAT also deallocates the memory back to 700MB. Is this normal Java behaviour? On Tue, 27 Jul 2021 at 20:34, Andy Seaborne <[email protected]> wrote: > If the dataset is read-only, then it is always empty. The > fuseki:serviceReadWriteGraphStore is the only way to get data into the > database. The other two services are read-only. > > Are you sure it is the database growing and not Java loading classes? On > a tight memory footprint, and because classes are loaded on-demand, > there are other sources of RAM usage. > > Also - The heap will grow until it hits the heap size. Java does not > call a full garbage collection until it needs to so sending SHACL > requests, or read-only queries, for example, will grow the heap and not > all the space is reclaimed until a full GC is done (Laura - this relates > to heap size < real RAM size and never swap). > > Andy > > On 27/07/2021 18:15, Marco Fiocco wrote: > > On Tue, 27 Jul 2021 at 18:04, Andy Seaborne <[email protected]> wrote: > > > >> > >> > >> On 27/07/2021 14:19, Marco Fiocco wrote: > >>> Hello, > >>> > >>> I'm running a in-memory Fuseki 3.16 server and I see that the allocated > >> memory keeps growing linearly indefinitely even if idle. > >> > >> That is strange because if there are no requests, it does no work. > >> > >>> Initially I reserved 1GB of memory and I've noticed that the process > >> gets OOM killed every 2 hours. > >> > >> What pattern of usage is it getting? > >> > > > > Actually it's used as read-only. But the memory grows even if there is no > > request. > > > > > >> > Now I've allocated 2GB because I've read somewhere that 2GB is the > >> minimum for Java heaps. Is that true? > >> > >> It's not that simple - you have an in-memory dataset so the space needed > >> is proportional to the amount of data. > >> > >> > > At the moment the the initial memory (with the dataset loaded with the > REST > > API) is around 600-700MB. > > From that it grows by itself... > > > > > >>> I'm waiting to see if it will get again. > >>> Is this a bug or there is a better way to config it? > >> > >> If you don't need the union graph, "rdf:type ja:MemoryDataset" is a > >> better in-memory choice. It has a smaller foot print and (I guess in > >> your setup you delete data as well as add it?) managed DELETE and PUT > >> better for GSP. TDB, in-memory is primarily a testing configuration. > >> > >> > > Would the memory be lower if instead of in-memory we use on disk TDB or > > TDB2? > > > > Thanks > > > >> > >>> > >>> My Fuseki config is: > >>> > >>> @prefix fuseki: <http://jena.apache.org/fuseki#> . > >>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > >>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > >>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . > >>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > >>> @prefix : <#> . > >>> > >>> [] rdf:type fuseki:Server . > >>> > >>> <#service> rdf:type fuseki:Service ; > >>> rdfs:label "Dataset with SHACL validation" ; > >>> fuseki:name "ds" ; > >> # > See > >> the endpoint url in build.gradle > >>> fuseki:serviceReadWriteGraphStore "data" ; > >> # SPARQL > Graph > >> store protocol (read and write) > >>> fuseki:endpoint [ fuseki:operation fuseki:query ; > >> fuseki:name "sparql" ] ; # SPARQL query service > >>> fuseki:endpoint [ fuseki:operation fuseki:shacl ; > >> fuseki:name "shacl" ] ; # SHACL query service > >>> fuseki:dataset <#dataset> . > >>> > >>> ## In memory TDB with union graph. > >>> <#dataset> rdf:type tdb:DatasetTDB ; > >>> tdb:location "--mem--" ; > >>> # Query timeout on this dataset (1s, 1000 milliseconds) > >>> ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ; > >>> # Make the default graph be the union of all named graphs. > >>> tdb:unionDefaultGraph true . > >>> > >>> Thanks > >>> Marco > >>> > >> > > >
