Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

jaanam Fri, 09 Apr 2021 06:12:57 -0700

Hi,

Could you suggest an optimal jena-fuseki heap size for my case ? I'msending 50 MBs file to my jena-fuseki memory-based dataset every 5minutes.


Jaana

(and should this be set to JVM actually ?)

jaa...@kolumbus.fi kirjoitti 8.4.2021 18:03:

Hello,

Still one question regarding this old issue. The previous answer said:
The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.
What would be the suitable heap size in my case ?
(And then very stupid additional question: If I'm running JVM jand
jena-fuseki in the same docker container, there's a risk that JVM
would take all free memory, thus I've set the JVM heap size to 2 G
using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
heap size for jena-fuseki ? )

Br, Jaana

Andy Seaborne kirjoitti 10.3.2021 17:04:
On 10/03/2021 02:33, jaa...@kolumbus.fi wrote:
Hi, Thanks for your quick anserwer and pls see my answers below!
How many triples?
And is is new data to replace the old data or in addition to theexisting data?
476955 triplets, most parts will bu just same as the old data, justsome triplets may change. And some new triplets may be added.
This is a TDB1 database?
jena-fuseki UI does not mention TDB1, but this is persistent and notTDB2.
But in our use case also memory-based datasets might work, as far asI've been testing in my PC they seem to work even better thanpersistent ones. What do you think ?
In-memory should be fine. Obviously, its lost when the server exits
but it sounds like the data isn't the primary copy and loading 476955
triples at start up is not big.

The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.

    Andy
Br Jaana



Andy Seaborne kirjoitti 9.3.2021 19:58:
Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,
I've met the following problem with jena-fuseki (should I createbug ticket ?):
We need to update jena-fuseki dataset every 5 minutes by a 50Mbytes ttl-file.
How many triples?
And is is new data to replace the old data or in addition to theexisting data?
This causes the memory consumption in the machine where jena-fusekiis running to increase by gigas.
This was 1st detected with jena-fuseki 3.8 and later withjena-fuseki 3.17.
To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in adocker container posting continously that ttl-file into the samedataset (pxmeta_hub_fed_prod).
This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html
The database grows for two reasons: it allocates space in sparsefiles
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy
see the output of command "du -h | sort -hr|head -30" below.attached the shell-script that I was executing during the timeperiod.
root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#
3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana

Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

Reply via email to