Re: Compaction on already compacted dataset causes dataset to grow

Brandon Sara Wed, 07 Apr 2021 09:29:50 -0700

Thank you for the explanation and quick response time. This might be worth 
adding to the documentation (if it isn’t already there).


> On Apr 7, 2021, at 6:48 AM, Andy Seaborne <[email protected]> wrote:
> 
> It may well do.
> 
> The exact size of databases depends on the order it is created. It changes 
> how the B+Tree nodes split over their life so while the B+tree holds the same 
> data, the space used can differ. It should settle down to the same size if 
> done repeatedly.
> 
> It may also depend on what exactly is being reported about a "file sized". 
> TDB2 uses sparse files - allocates 8M chunks but does not use all the space 
> immediately. Different OS and different tools on Linux seem to report 
> differently, whether it is allocated space or used space.
> 
>       Andy
> 
> On 06/04/2021 21:43, Brandon Sara wrote:
>> I have a very large dataset. Before compaction, it was ~51 GB. I ran
>> compaction (using tdb2.tdbcompact cli tool) and it dropped down to 6.7 GB.
>> I then wanted to see how long it would take to run compaction on an already
>> compacted dataset. After running it, it grew in size to 7.4 GB, then it
>> grew with every subsequent compaction until it reached 7.6 GB.
>> Is this a bug? Do I have something configured incorrectly? Would compaction
>> not cause the dataset to grow in size if I ran it via the fuseki webapp
>> /$/compact/* endpoint?
>> Jena Version: 3.17.0
>> Thanks.


-- 


*NOTICES*:

 

1.  **No PHI in Email**.  Collective Medical policy 
prohibits sending protected health information by email, which may violate 
applicable law. If sending PHI is necessary, please contact me for secure 
delivery instructions.

 

2.  **Confidentiality**.  This message and any 
attachments may be confidential and proprietary. If you received this in 
error, please contact me immediately and delete this message.

smime.p7s
Description: S/MIME cryptographic signature

Re: Compaction on already compacted dataset causes dataset to grow

Reply via email to