Hello,
I've been trying TDB2 with compact. I have 2 TDB2 datasets in my
jena-fuseki. Both of them are being uploaded by 50 MBs every 5 minutes.
At the same time they are compacted hourly by the attached script.
At some point I start getting thse messages:
+ curl -i -XPOST 'localhost:8061/$/compact/pxmeta_hub_fed_prod'
% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent Left
Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:--
--:--:-- 0
100 59 100 59 0 0 59000 0 --:--:-- --:--:--
--:--:-- 59000
HTTP/1.1 400 Bad Request
Date: Tue, 30 Mar 2021 23:54:47 GMT
Fuseki-Request-Id: 2706
Content-Type: text/plain;charset=utf-8
Cache-Control: must-revalidate,no-cache,no-store
Pragma: no-cache
Content-Length: 59
Async task request rejected - exceeds the limit of 4 tasks
After this the dataset in question doesn't any more return from queries
invoked in UI.
What is wrong now ?
Br, Jaana
Andy Seaborne kirjoitti 9.3.2021 19:58:
Hi Jaana,
On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:
hello,
I've met the following problem with jena-fuseki (should I create bug
ticket ?):
We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes
ttl-file.
How many triples?
And is is new data to replace the old data or in addition to the
existing data?
This causes the memory consumption in the machine where jena-fuseki is
running to increase by gigas.
This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki
3.17.
To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker
container posting continously that ttl-file into the same dataset
(pxmeta_hub_fed_prod).
This is a TDB1 database?
TDB2 is better at this - the database still grows but there is a way
to compact the database live.
JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html
The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)
Andy
see the output of command "du -h | sort -hr|head -30" below. attached
the shell-script that I was executing during the time period.
root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G .
8.5G ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G ./data/fuseki/databases
8.5G ./data/fuseki
8.5G ./data
root@3d53dc3fdf8d:/# date
Tue Mar 9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#
3.5G .
3.0G ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G ./data/fuseki/databases
3.0G ./data/fuseki
3.0G ./data
root@3d53dc3fdf8d:/# date
Tue Mar 9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#
Br, Jaana
#!/bin/sh
set -x
while :
do
sleep 1h
date=$(date)
handle_dataset()
{
echo "starting to compact" $date $1 >>log.txt
curl -i -XPOST localhost:3031/$/compact/$1
cd ~/jena/apache-jena-fuseki-3.17.0/run/databases/$1
count=$(ls |grep Data- | wc -l)
for FILE in Data-*; do
if [ $count -gt 1 ]
then
echo "deleting" $FILE >>log.txt
rm -rf $FILE
count=$(( count - 1))
fi
done
}
handle_dataset pxmeta_hub_fed
handle_dataset pxmeta_hub_fed_prod
done