Hi Lorenz, Would you recommend using tdb1 instead of tdb2 for our use case? What would be the differences? We are using fuseki 4.5.0 btw.
Gaspar > On 6 Jul 2022, at 14:39, Bartalus Gáspár > <bartalus.gas...@codespring.ro.INVALID> wrote: > > Hi, > > Most of the updates are DELETE/INSERT queries, i.e > > DELETE {?s ?p ?oldValue} > INSERT {?s ?p ?newValue} > WHERE { > OPTIONAL {?s ?p ?oldValue} > #derive ?newValue from somewhere > } > > We also have some separate DELETE queries and INSERT queries. > > I’ve tried HTTP POST /$/compact/db_name and as a result the files are getting > back to normal size. However, as far as I can tell the old files are also > kept. This is the folder structure I see: > - databases/db_name/Data-0001 - with the old large files > - databases/db_name/Data-0002 - presumably the result of the compact > operation with normal file sizes. > > Is there also some operation (http or cli) that would keep only one (the > latest) data folder, i.e. delete the old files from Data-0001? > > Gaspar > >> On 6 Jul 2022, at 12:52, Lorenz Buehmann >> <buehm...@informatik.uni-leipzig.de> wrote: >> >> Ok, interesting >> >> so >> >> we have >> >> - 150k triples, rather small dataset >> >> - loaded into 10MB node table files >> >> - 10 updates every 5s >> >> - which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day >> >> - and leads to 10GB node table files >> >> >> Can you share the shape of those update queries? >> >> >> After doing a "compact" operation, the files are getting back to "normal" >> size? >> >> >> On 06.07.22 10:36, Bartalus Gáspár wrote: >>> Hi Lorenz, >>> >>> Thanks for quick feedback and clarification on lucene indexes. >>> >>> Here are my answers to your questions: >>> - We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the >>> others are below 200Kb. >>> - The overall number of triples after data upload is ~150000. >>> - We have around 10 SPARQL UPDATE queries that are executed on a regular >>> and frequent basis, i.e. every 5 seconds. We also have 5 such queries that >>> are executed each minute. But most of the time they do not produce any >>> outcome, i.e. the dataset is not altered, and when they do, there are just >>> a couple of triples that are added to the dataset. >>> - These *.dat files start from ~10Mb in size, and after a day or so some of >>> them grow to ~10Gb. >>> >>> We have ~300 blank nodes, and ~half of the triples have a literal in the >>> object position, so ~75000. >>> >>> Best regards, >>> Gaspar >>> >>> >>> >>>> On 6 Jul 2022, at 10:55, Lorenz Buehmann >>>> <buehm...@informatik.uni-leipzig.de> wrote: >>>> >>>> Hi and welcome Gaspar. >>>> >>>> >>>> Those files do contain the node tables. >>>> >>>> A Lucene index is never computed by default and would be contained in >>>> Lucene specific index files. >>>> >>>> >>>> Can you give some details about the >>>> >>>> - size of the files >>>> - the number of triples >>>> - the number triples added/removed/changed >>>> - the frequency of updates >>>> - how much the files grow >>>> - what kind of data you insert? Lots of blank nodes? Or literals? >>>> >>>> Also, did you try a compact operation during time? >>>> >>>> Lorenz >>>> >>>> On 06.07.22 09:40, Bartalus Gáspár wrote: >>>>> Hi Jena support team, >>>>> >>>>> We are experiencing an issue with Jena Fuseki databases. In the databases >>>>> folder we see some files called SPO.dat, OSP.dat, etc., and the size of >>>>> these files are growing quickly. From our understanding these files are >>>>> containing the Lucene indexes. We would have two questions: >>>>> >>>>> 1. Why are these files growing rapidly, although the underlying data >>>>> (triples) are not being changed, or only slightly changed? >>>>> 2. Can we disable indexing easily, since we are not using full text >>>>> searches in our SPARQL queries? >>>>> >>>>> Our usage of Jena Fuseki: >>>>> >>>>> * Start the server with `fuseki-server —port 3030` >>>>> * Create databases with HTTP POST to >>>>> `/$/datasets?state=active&dbType=tdb2&dbName=db_name` >>>>> * Upload ttl files with HTTP POST to /db_name/data >>>>> >>>>> Thanks in advance for your feedback, and if you’d require more input from >>>>> our side, please let me know. >>>>> >>>>> Best regards, >>>>> Gaspar Bartalus >>>>> >
smime.p7s
Description: S/MIME cryptographic signature