Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Bartalus Gáspár Thu, 07 Jul 2022 01:53:31 -0700

Hi Lorenz,

Would you recommend using tdb1 instead of tdb2 for our use case? What would be 
the differences?
We are using fuseki 4.5.0 btw.


Gaspar

> On 6 Jul 2022, at 14:39, Bartalus Gáspár 
> <bartalus.gas...@codespring.ro.INVALID> wrote:
> 
> Hi,
> 
> Most of the updates are DELETE/INSERT queries, i.e
> 
> DELETE {?s ?p ?oldValue}
> INSERT {?s ?p ?newValue}
> WHERE {
>  OPTIONAL {?s ?p ?oldValue}
>  #derive ?newValue from somewhere
> }
> 
> We also have some separate DELETE queries and INSERT queries.
> 
> I’ve tried HTTP POST /$/compact/db_name and as a result the files are getting 
> back to normal size. However, as far as I can tell the old files are also 
> kept. This is the folder structure I see:
> - databases/db_name/Data-0001 - with the old large files
> - databases/db_name/Data-0002 - presumably the result of the compact 
> operation with normal file sizes.
> 
> Is there also some operation (http or cli) that would keep only one (the 
> latest) data folder, i.e. delete the old files from Data-0001?
> 
> Gaspar
> 
>> On 6 Jul 2022, at 12:52, Lorenz Buehmann 
>> <buehm...@informatik.uni-leipzig.de> wrote:
>> 
>> Ok, interesting
>> 
>> so
>> 
>> we have
>> 
>> - 150k triples, rather small dataset
>> 
>> - loaded into 10MB node table files
>> 
>> - 10 updates every 5s
>> 
>> - which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day
>> 
>> - and leads to 10GB node table files
>> 
>> 
>> Can you share the shape of those update queries?
>> 
>> 
>> After doing a "compact" operation, the files are getting back to "normal" 
>> size?
>> 
>> 
>> On 06.07.22 10:36, Bartalus Gáspár wrote:
>>> Hi Lorenz,
>>> 
>>> Thanks for quick feedback and clarification on lucene indexes.
>>> 
>>> Here are my answers to your questions:
>>> - We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the 
>>> others are below 200Kb.
>>> - The overall number of triples after data upload is  ~150000.
>>> - We have around 10 SPARQL UPDATE queries that are executed on a regular 
>>> and frequent basis, i.e. every 5 seconds. We also have 5 such queries that 
>>> are executed each minute. But most of the time they do not produce any 
>>> outcome, i.e. the dataset is not altered, and when they do, there are just 
>>> a couple of triples that are added to the dataset.
>>> - These *.dat files start from ~10Mb in size, and after a day or so some of 
>>> them grow to ~10Gb.
>>> 
>>> We have ~300 blank nodes, and ~half of the triples have a literal in the 
>>> object position, so ~75000.
>>> 
>>> Best regards,
>>> Gaspar
>>> 
>>> 
>>> 
>>>> On 6 Jul 2022, at 10:55, Lorenz Buehmann 
>>>> <buehm...@informatik.uni-leipzig.de> wrote:
>>>> 
>>>> Hi and welcome Gaspar.
>>>> 
>>>> 
>>>> Those files do contain the node tables.
>>>> 
>>>> A Lucene index is never computed by default and would be contained in 
>>>> Lucene specific index files.
>>>> 
>>>> 
>>>> Can you give some details about the
>>>> 
>>>> - size of the files
>>>> - the number of triples
>>>> - the number triples added/removed/changed
>>>> - the frequency of updates
>>>> - how much the files grow
>>>> - what kind of data you insert? Lots of blank nodes? Or literals?
>>>> 
>>>> Also, did you try a compact operation during time?
>>>> 
>>>> Lorenz
>>>> 
>>>> On 06.07.22 09:40, Bartalus Gáspár wrote:
>>>>> Hi Jena support team,
>>>>> 
>>>>> We are experiencing an issue with Jena Fuseki databases. In the databases 
>>>>> folder we see some files called SPO.dat, OSP.dat, etc., and the size of 
>>>>> these files are growing quickly. From our understanding these files are 
>>>>> containing the Lucene indexes. We would have two questions:
>>>>> 
>>>>> 1. Why are these files growing rapidly, although the underlying data 
>>>>> (triples) are not being changed, or only slightly changed?
>>>>> 2. Can we disable indexing easily, since we are not using full text 
>>>>> searches in our SPARQL queries?
>>>>> 
>>>>> Our usage of Jena Fuseki:
>>>>> 
>>>>> * Start the server with `fuseki-server —port 3030`
>>>>> * Create databases with HTTP POST to 
>>>>> `/$/datasets?state=active&dbType=tdb2&dbName=db_name`
>>>>> * Upload ttl files with HTTP POST to /db_name/data
>>>>> 
>>>>> Thanks in advance for your feedback, and if you’d require more input from 
>>>>> our side, please let me know.
>>>>> 
>>>>> Best regards,
>>>>> Gaspar Bartalus
>>>>> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: [MASSMAIL]Re: Large *.dat files in Fuseki

Reply via email to