Re: In-memory Fuseki keeps growing memory indefinitely even if idle

Marco Fiocco Thu, 29 Jul 2021 02:46:17 -0700

I'm building a docker image with openjdk:14-alpine and cannot enable
ShenandoahGC, even with the experimental feature flag.
It seems that OpenJDK must be compiled with some feature in order to
support that.


However I've tuned the Java config with "-Xmx512m -Xms512m" and max
reserved memory in my docker orchestrator to 1024MB, and it works fine.
The memory keeps growing as usual but it get flushed by the GC around 750MB
so it's never OOM killed at least.

Thanks
Marco



On Wed, 28 Jul 2021 at 11:01, Andy Seaborne <a...@apache.org> wrote:

>
>
> On 27/07/2021 22:17, Marco Fiocco wrote:
> > Ok let me clarify the steps.
> > I start Fuseki as a docker container with the config you saw earlier.
> > Then I load the dataset with curl. After that I intend to use Fuseki as
> > "read only" query only.
> > That at moment, there is absolutely no query being done, but still the
> > memory allocated keeps growing.
>
> Does the Fuseki log show finished requests?
>
> >
> > I've noticed 2 interesting things though:
> > Fuseki starts with Java options "-Xmx2048m -Xms2048m" and I have now
> > reserved 3GB of RAM for the service.
> > Now
> > - If wait enough, the memory keeps growing but when it reaches about
> 2.4GB
> > it is suddenly deallocated back to starting size when the dataset was
> > loaded (about 700MB)
> > - if I access the Fuseki /ds endpoint to download the data, THAT also
> > deallocates the memory back to 700MB.
> >
> > Is this normal Java behaviour?
>
> Yes.
>
> As queries happen, the heap will grow, There is some reclamation in
> incremental GCs, which are very quick and happening much of the time,
> but these GC cycles do not collect all unused space. A full GC can do that.
>
> Th JDK runtime lets the heap become full with space that didn't get
> reclaimed by an incremental GC, then it trigger a full GC. At that
> point, the in-use space drops right back to the in-use space which ree
> is the data of the in-memory dataset.
>
> Very roughly, Java takes about 0.5G over the heap size so 2.4GB is what
> to expect before the runtime triggers a full GC and the full GC will
> reduce the heap in uses. The process size will not shrink.
>
> TDB1 itself isn't perfect at releasing unused data if deletion occurs
> but your description doesn't include any deletes.
>
> > the process gets OOM killed every 2 hours
>
> Is that an exception of some OS control killing the process because it
> exceeds some configuration limit? (ulimit, VM or conatiner provision,
> ....) Is that what you mean by "now reserved 3GB of RAM for the
> service." because f it was 2G, that limit will be hit.
>
> The OS process will grow to more than 2G, the heap isn't the only use of
> space in Java.
>
> Do you get an OutOfMemoryException (OOME) from Java or a the OS (etc)
> says it is too big?
>
> An OOME happens when the full GC does not release memory to fulfil a
> request for space from Java.
>
> Capturing state at that point (Jerven's -XX:+HeapDumpOnOutOfMemoryError
>   suggestion) would help identify why. But is there a request in
> progress at the time (see the Fuseki log)?
>
>      Andy
>
> > On Tue, 27 Jul 2021 at 20:34, Andy Seaborne <a...@apache.org> wrote:
> >
> >> If the dataset is read-only, then it is always empty. The
> >> fuseki:serviceReadWriteGraphStore is the only way to get data into the
> >> database. The other two services are read-only.
> >>
> >> Are you sure it is the database growing and not Java loading classes? On
> >> a tight memory footprint, and because classes are loaded on-demand,
> >> there are other sources of RAM usage.
> >>
> >> Also - The heap will grow until it hits the heap size. Java does not
> >> call a full garbage collection until it needs to so sending SHACL
> >> requests, or read-only queries, for example, will grow the heap and not
> >> all the space is reclaimed until a full GC is done (Laura - this relates
> >> to heap size < real RAM size and never swap).
> >>
> >>       Andy
> >>
> >> On 27/07/2021 18:15, Marco Fiocco wrote:
> >>> On Tue, 27 Jul 2021 at 18:04, Andy Seaborne <a...@apache.org> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 27/07/2021 14:19, Marco Fiocco wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I'm running a in-memory Fuseki 3.16 server and I see that the
> allocated
> >>>> memory keeps growing linearly indefinitely even if idle.
> >>>>
> >>>> That is strange because if there are no requests, it does no work.
> >>>>
> >>>>> Initially I reserved 1GB of memory and I've noticed that the process
> >>>> gets OOM killed every 2 hours.
> >>>>
> >>>> What pattern of usage is it getting?
> >>>>
> >>>
> >>> Actually it's used as read-only. But the memory grows even if there is
> no
> >>> request.
> >>>
> >>>
> >>>>     > Now I've allocated 2GB because I've read somewhere that 2GB is
> the
> >>>> minimum for Java heaps. Is that true?
> >>>>
> >>>> It's not that simple - you have an in-memory dataset so the space
> needed
> >>>> is proportional to the amount of data.
> >>>>
> >>>>
> >>> At the moment the the initial memory (with the dataset loaded with the
> >> REST
> >>> API) is around 600-700MB.
> >>>   From that it grows by itself...
> >>>
> >>>
> >>>>> I'm waiting to see if it will get again.
> >>>>> Is this a bug or there is a better way to config it?
> >>>>
> >>>> If you don't need the union graph, "rdf:type ja:MemoryDataset" is a
> >>>> better in-memory choice. It has a smaller foot print and (I guess in
> >>>> your setup you delete data as well as add it?) managed DELETE and PUT
> >>>> better for GSP.  TDB, in-memory is primarily a testing configuration.
> >>>>
> >>>>
> >>> Would the memory be lower if instead of in-memory we use on disk TDB or
> >>> TDB2?
> >>>
> >>> Thanks
> >>>
> >>>>
> >>>>>
> >>>>> My Fuseki config is:
> >>>>>
> >>>>> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> >>>>> @prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> >>>>> @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
> >>>>> @prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
> >>>>> @prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
> >>>>> @prefix :        <#> .
> >>>>>
> >>>>> [] rdf:type fuseki:Server .
> >>>>>
> >>>>> <#service> rdf:type fuseki:Service ;
> >>>>>        rdfs:label          "Dataset with SHACL validation" ;
> >>>>>        fuseki:name         "ds" ;
> >>>>
>   #
> >> See
> >>>> the endpoint url in build.gradle
> >>>>>        fuseki:serviceReadWriteGraphStore "data" ;
> >>>>                                                                #
> SPARQL
> >> Graph
> >>>> store protocol (read and write)
> >>>>>        fuseki:endpoint  [ fuseki:operation fuseki:query ;
> >>>>    fuseki:name "sparql"  ] ;       # SPARQL query service
> >>>>>        fuseki:endpoint  [ fuseki:operation fuseki:shacl ;
> >>>>    fuseki:name "shacl" ] ;         # SHACL query service
> >>>>>        fuseki:dataset      <#dataset> .
> >>>>>
> >>>>> ## In memory TDB with union graph.
> >>>>> <#dataset> rdf:type   tdb:DatasetTDB ;
> >>>>>      tdb:location "--mem--" ;
> >>>>>      # Query timeout on this dataset (1s, 1000 milliseconds)
> >>>>>      ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000"
> ] ;
> >>>>>      # Make the default graph be the union of all named graphs.
> >>>>>      tdb:unionDefaultGraph true .
> >>>>>
> >>>>> Thanks
> >>>>> Marco
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: In-memory Fuseki keeps growing memory indefinitely even if idle

Reply via email to