Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Bartalus Gáspár Wed, 06 Jul 2022 01:36:17 -0700

Hi Lorenz,

Thanks for quick feedback and clarification on lucene indexes.


Here are my answers to your questions:
- We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the 
others are below 200Kb.
- The overall number of triples after data upload is  ~150000.
- We have around 10 SPARQL UPDATE queries that are executed on a regular and 
frequent basis, i.e. every 5 seconds. We also have 5 such queries that are 
executed each minute. But most of the time they do not produce any outcome, 
i.e. the dataset is not altered, and when they do, there are just a couple of 
triples that are added to the dataset.
- These *.dat files start from ~10Mb in size, and after a day or so some of 
them grow to ~10Gb.

We have ~300 blank nodes, and ~half of the triples have a literal in the object 
position, so ~75000.

Best regards,
Gaspar



> On 6 Jul 2022, at 10:55, Lorenz Buehmann <[email protected]> 
> wrote:
> 
> Hi and welcome Gaspar.
> 
> 
> Those files do contain the node tables.
> 
> A Lucene index is never computed by default and would be contained in Lucene 
> specific index files.
> 
> 
> Can you give some details about the
> 
> - size of the files
> - the number of triples
> - the number triples added/removed/changed
> - the frequency of updates
> - how much the files grow
> - what kind of data you insert? Lots of blank nodes? Or literals?
> 
> Also, did you try a compact operation during time?
> 
> Lorenz
> 
> On 06.07.22 09:40, Bartalus Gáspár wrote:
>> Hi Jena support team,
>> 
>> We are experiencing an issue with Jena Fuseki databases. In the databases 
>> folder we see some files called SPO.dat, OSP.dat, etc., and the size of 
>> these files are growing quickly. From our understanding these files are 
>> containing the Lucene indexes. We would have two questions:
>> 
>> 1. Why are these files growing rapidly, although the underlying data 
>> (triples) are not being changed, or only slightly changed?
>> 2. Can we disable indexing easily, since we are not using full text searches 
>> in our SPARQL queries?
>> 
>> Our usage of Jena Fuseki:
>> 
>> * Start the server with `fuseki-server —port 3030`
>> * Create databases with HTTP POST to 
>> `/$/datasets?state=active&dbType=tdb2&dbName=db_name`
>> * Upload ttl files with HTTP POST to /db_name/data
>> 
>> Thanks in advance for your feedback, and if you’d require more input from 
>> our side, please let me know.
>> 
>> Best regards,
>> Gaspar Bartalus
>>

smime.p7s
Description: S/MIME cryptographic signature

Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Reply via email to