Hello,

I've been trying TDB2 with compact. I have 2 TDB2 datasets in my jena-fuseki. Both of them are being uploaded by 50 MBs every 5 minutes.

At the same time they are compacted hourly by the attached script.

At some point I start getting thse messages:

  + curl -i -XPOST 'localhost:8061/$/compact/pxmeta_hub_fed_prod'
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 59 100 59 0 0 59000 0 --:--:-- --:--:-- --:--:-- 59000
     HTTP/1.1 400 Bad Request
     Date: Tue, 30 Mar 2021 23:54:47 GMT
     Fuseki-Request-Id: 2706
     Content-Type: text/plain;charset=utf-8
     Cache-Control: must-revalidate,no-cache,no-store
     Pragma: no-cache
     Content-Length: 59

     Async task request rejected - exceeds the limit of 4 tasks

After this the dataset in question doesn't any more return from queries invoked in UI.

What is wrong now ?

Br, Jaana


Andy Seaborne kirjoitti 9.3.2021 19:58:
Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,

I've met the following problem with jena-fuseki (should I create bug ticket ?):

We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes ttl-file.

How many triples?
And is is new data to replace the old data or in addition to the existing data?

This causes the memory consumption in the machine where jena-fuseki is running to increase by gigas.

This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 3.17.

To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker container posting continously that ttl-file into the same dataset (pxmeta_hub_fed_prod).

This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy


see the output of command "du -h | sort -hr|head -30" below. attached the shell-script that I was executing during the time period.

root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data

root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#


3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana
#!/bin/sh
set -x
while :
do
sleep 1h
date=$(date)
handle_dataset()
{
	echo "starting to compact" $date $1 >>log.txt
	curl -i -XPOST localhost:3031/$/compact/$1
	cd ~/jena/apache-jena-fuseki-3.17.0/run/databases/$1
	count=$(ls |grep Data- | wc -l)

	for FILE in Data-*; do 	
		if [ $count -gt 1 ] 
		then
			echo "deleting" $FILE >>log.txt
			rm -rf $FILE
			count=$(( count - 1)) 
		fi
	done
}
handle_dataset pxmeta_hub_fed
handle_dataset pxmeta_hub_fed_prod
done

Reply via email to