It may well do.

The exact size of databases depends on the order it is created. It changes how the B+Tree nodes split over their life so while the B+tree holds the same data, the space used can differ. It should settle down to the same size if done repeatedly.

It may also depend on what exactly is being reported about a "file sized". TDB2 uses sparse files - allocates 8M chunks but does not use all the space immediately. Different OS and different tools on Linux seem to report differently, whether it is allocated space or used space.

        Andy

On 06/04/2021 21:43, Brandon Sara wrote:
I have a very large dataset. Before compaction, it was ~51 GB. I ran
compaction (using tdb2.tdbcompact cli tool) and it dropped down to 6.7 GB.
I then wanted to see how long it would take to run compaction on an already
compacted dataset. After running it, it grew in size to 7.4 GB, then it
grew with every subsequent compaction until it reached 7.6 GB.

Is this a bug? Do I have something configured incorrectly? Would compaction
not cause the dataset to grow in size if I ran it via the fuseki webapp
/$/compact/* endpoint?

Jena Version: 3.17.0

Thanks.

Reply via email to