Hi Jaana,
On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,
I've met the following problem with jena-fuseki (should I create bug
ticket ?):
We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes
ttl-file.
How many triples?
And is is new data to replace the old data or in addition to the
existing data?
This causes the memory consumption in the machine where
jena-fuseki is running to increase by gigas.
This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 3.17.
To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker
container posting continously that ttl-file into the same dataset
(pxmeta_hub_fed_prod).
This is a TDB1 database?
TDB2 is better at this - the database still grows but there is a way to
compact the database live.
JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html
The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used. The
space for deleted data is not fully recycled across transactions because
it may be in-use in a concurrent operation. (TDB1 would be very
difficult to do block ref counting; in TDB2 the solution is compaction.)
Andy
see the output of command "du -h | sort -hr|head -30" below. attached
the shell-script that I was executing during the time period.
root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G .
8.5G ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G ./data/fuseki/databases
8.5G ./data/fuseki
8.5G ./data
root@3d53dc3fdf8d:/# date
Tue Mar 9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#
3.5G .
3.0G ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G ./data/fuseki/databases
3.0G ./data/fuseki
3.0G ./data
root@3d53dc3fdf8d:/# date
Tue Mar 9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#
Br, Jaana