Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

jaanam Tue, 30 Mar 2021 20:17:33 -0700

Hello,

I've been trying TDB2 with compact. I have 2 TDB2 datasets in myjena-fuseki. Both of them are being uploaded by 50 MBs every 5 minutes.


At the same time they are compacted hourly by the attached script.

At some point I start getting thse messages:

  + curl -i -XPOST 'localhost:8061/$/compact/pxmeta_hub_fed_prod'

% Total % Received % Xferd Average Speed Time TimeTime CurrentDload Upload Total Spent LeftSpeed

0 0 0 0 0 0 0 0 --:--:-- --:--:----:--:-- 0100 59 100 59 0 0 59000 0 --:--:-- --:--:----:--:-- 59000

     HTTP/1.1 400 Bad Request
     Date: Tue, 30 Mar 2021 23:54:47 GMT
     Fuseki-Request-Id: 2706
     Content-Type: text/plain;charset=utf-8
     Cache-Control: must-revalidate,no-cache,no-store
     Pragma: no-cache
     Content-Length: 59

     Async task request rejected - exceeds the limit of 4 tasks

After this the dataset in question doesn't any more return from queriesinvoked in UI.


What is wrong now ?

Br, Jaana


Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,
I've met the following problem with jena-fuseki (should I create bugticket ?):
We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytesttl-file.
How many triples?
And is is new data to replace the old data or in addition to theexisting data?
This causes the memory consumption in the machine where jena-fuseki isrunning to increase by gigas.
This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki3.17.
To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a dockercontainer posting continously that ttl-file into the same dataset(pxmeta_hub_fed_prod).
This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy
see the output of command "du -h | sort -hr|head -30" below. attachedthe shell-script that I was executing during the time period.
root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#
3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana

#!/bin/sh
set -x
while :
do
sleep 1h
date=$(date)
handle_dataset()
{
	echo "starting to compact" $date $1 >>log.txt
	curl -i -XPOST localhost:3031/$/compact/$1
	cd ~/jena/apache-jena-fuseki-3.17.0/run/databases/$1
	count=$(ls |grep Data- | wc -l)

	for FILE in Data-*; do 	
		if [ $count -gt 1 ] 
		then
			echo "deleting" $FILE >>log.txt
			rm -rf $FILE
			count=$(( count - 1)) 
		fi
	done
}
handle_dataset pxmeta_hub_fed
handle_dataset pxmeta_hub_fed_prod
done

Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

Reply via email to