On 28/04/2021 22:36, Brandon Sara wrote:
My Setup:
I’m running a few fuseki servers via Docker containers. I need the storage to 
be persistent across container restarts, so I’m using TDB2 for my storage. The 
TDB2 database are stored on a volume that is mounted to the Docker containers.

What is the storage for the database? EBS disk? EBS SSD?

This volume is part of our S3 instance. The Fuseki servers’ individual DBs are 
kept in sync using RDF-Delta. The dataset in question is using full text search 
using jena-text (lucene) with two properties being indexed (though, they occur 
often in the dataset). The reasoner being used is `TransitiveReasoner`. I have 
only one default graph and no other graphs.

My Problem:
To upload ~10 MB of data (in a ttl file format), it is taking sometimes more 
than 3 hours to complete! We tried turning off full text search and it cut the 
time in ~half.

OK - so indexing is costing 1.5 hours which is a long time suggesting the storage is very slow. What is the lucene index stored on? Same as the TDB2 database?

If it is a single file, the S3-write is going to be a single commit and a single S3 block. S3 isn't a filesystem but

But still 1.5 hours for only 10MB of triple data is waaaay too long. Does 
anyone have any ideas of how we could fix this issue (except the obvious to not 
use a network connected disk)?

10Mb is how many triples? and how many are indexed into Lucene?

I think you'll need to experiment with simplified setups to see where the time is going. This includes making sure the heap isn't doing a lot of work.

    Andy


Thanks.

Reply via email to