Re: Compaction on already compacted dataset causes dataset to grow

Andy Seaborne Wed, 07 Apr 2021 05:48:35 -0700

It may well do.

The exact size of databases depends on the order it is created. Itchanges how the B+Tree nodes split over their life so while the B+treeholds the same data, the space used can differ. It should settle down tothe same size if done repeatedly.

It may also depend on what exactly is being reported about a "filesized". TDB2 uses sparse files - allocates 8M chunks but does not useall the space immediately. Different OS and different tools on Linuxseem to report differently, whether it is allocated space or used space.


        Andy

On 06/04/2021 21:43, Brandon Sara wrote:

I have a very large dataset. Before compaction, it was ~51 GB. I ran
compaction (using tdb2.tdbcompact cli tool) and it dropped down to 6.7 GB.
I then wanted to see how long it would take to run compaction on an already
compacted dataset. After running it, it grew in size to 7.4 GB, then it
grew with every subsequent compaction until it reached 7.6 GB.

Is this a bug? Do I have something configured incorrectly? Would compaction
not cause the dataset to grow in size if I ran it via the fuseki webapp
/$/compact/* endpoint?

Jena Version: 3.17.0

Thanks.

Re: Compaction on already compacted dataset causes dataset to grow

Reply via email to