Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,

I've met the following problem with jena-fuseki (should I create bug ticket ?):

We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes ttl-file.

How many triples?
And is is new data to replace the old data or in addition to the existing data?

This causes the memory consumption in the machine where jena-fuseki is running to increase by gigas.

This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 3.17.

To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker container posting continously that ttl-file into the same dataset (pxmeta_hub_fed_prod).

This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse files in 8M chunks but the space does not count in du until actually used. The space for deleted data is not fully recycled across transactions because it may be in-use in a concurrent operation. (TDB1 would be very difficult to do block ref counting; in TDB2 the solution is compaction.)

    Andy


see the output of command "du -h | sort -hr|head -30" below. attached the shell-script that I was executing during the time period.

root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data

root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#


3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana

Reply via email to