Without knowing anything about the contents of those files it is hard to say if those numbers are expected, as there aren’t any general rules of thumb about how big the database should be relative to the input data. It depends heavily on the input data contents, how the input data was loaded etc.
Were each of these files uploaded separately, or as separate transactions? You could try compacting (https://jena.apache.org/documentation/tdb2/tdb2_admin.html) the database to see if that helps. TDB2 is implemented using copy on write data structures, so each new write transaction will expand the size of the database because it takes copies of existing data blocks before modifying them as there may be ongoing read transactions that need the original blocks still. A compaction rewrites the database to keep only current blocks, discarding all the old blocks that are no longer referenced by the current state of the database. This requires an exclusive write lock on the database so can only be done either during server downtime, or quiet periods. Given 47GB of total data there was probably a lot of copy on write churn that happened during the data load, and I’d expect that the compaction would bring that size down substantially. Hope this helps, Rob From: Francesco Bruno <[email protected]> Date: Monday, 10 February 2025 at 10:22 To: [email protected] <[email protected]> Subject: Question Regarding Large Index Size in Fuseki Dear Apache Jena Team, We recently uploaded 18 TTL files totaling 47GB to our Fuseki instance. However, we noticed that the resulting index size is significantly larger - around 296GB. We have deactivated the GSPO, GPOS, and GOSP indexes, yet the size remains quite large. Could you confirm if this is expected behavior? Are there any optimizations or configurations we could apply to reduce the index size? Thank you for your time and support. Best regards, Maria Pereira & Francesco Bruno
