Re: Question Regarding Large Index Size in Fuseki

Rob @ DNR Mon, 10 Feb 2025 02:53:29 -0800

Without knowing anything about the contents of those files it is hard to say if 
those numbers are expected, as there aren’t any general rules of thumb about 
how big the database should be relative to the input data.  It depends heavily 
on the input data contents, how the input data was loaded etc.


Were each of these files uploaded separately, or as separate transactions?

You could try compacting 
(https://jena.apache.org/documentation/tdb2/tdb2_admin.html) the database to 
see if that helps.

TDB2 is implemented using copy on write data structures, so each new write 
transaction will expand the size of the database because it takes copies of 
existing data blocks before modifying them as there may be ongoing read 
transactions that need the original blocks still.  A compaction rewrites the 
database to keep only current blocks, discarding all the old blocks that are no 
longer referenced by the current state of the database.  This requires an 
exclusive write lock on the database so can only be done either during server 
downtime, or quiet periods.

Given 47GB of total data there was probably a lot of copy on write churn that 
happened during the data load, and I’d expect that the compaction would bring 
that size down substantially.

Hope this helps,

Rob

From: Francesco Bruno <[email protected]>
Date: Monday, 10 February 2025 at 10:22
To: [email protected] <[email protected]>
Subject: Question Regarding Large Index Size in Fuseki
Dear Apache Jena Team,

We recently uploaded 18 TTL files totaling 47GB to our Fuseki instance.
However, we noticed that the resulting index size is significantly
larger - around 296GB. We have deactivated the GSPO, GPOS, and GOSP
indexes, yet the size remains quite large.

Could you confirm if this is expected behavior? Are there any
optimizations or configurations we could apply to reduce the index size?

Thank you for your time and support.

Best regards,
Maria Pereira & Francesco Bruno

Re: Question Regarding Large Index Size in Fuseki

Reply via email to