Re: Fuseki TDB database size growth

Rob Vesse Wed, 30 Aug 2017 02:29:37 -0700

No, it is perfectly usable as a primary database

However, if your use case regularly rewrites your entire database then you are 
going to have problems and this would be true of any database system, although 
obviously implementation specifics will have an impact on this.


Rob

On 22/08/2017 03:22, "Chris Tomlinson" <chris.j.tomlin...@gmail.com> wrote:

    Hi,
    
    This is interesting to know about blank nodes and reference counting. Does 
the comment regarding deleting triples not recovering blank nodes apply if an 
entire named graph which includes some blank nodes is deleted?
    
    If so it seems that in production Jena/TDB is expected to be periodically 
reloaded from scratch or to not use blank nodes very much. 
    
    In this case is Jena/TDB more aimed at use cases where it perhaps functions 
like an index cache rather than a primary database. Is this accurate? If so 
what sort of primary database systems are typically found coupled with Jena/TDB?
    
    Regards,
    Chris
    
    > On Aug 21, 2017, at 05:28, Rob Vesse <rve...@dotnetrdf.org> wrote:
    > 
    > All the data structures used in TDB are broadly speaking append only. 
This means that the database Will tend to grow in size overtime.
    > 
    > Certain ways of using the database can exacerbate this. In your example I 
would guess that you have a lot of blank nodes present in the data?
    > 
    > Each unique blank node generates a unique identifier inside the system 
and will continually expand the node table. TDB does not implement reference 
counting so even if you delete every triple that references a given RDF node it 
will never be removed from the node table.
    > 
    > Similarly as the indexes are updated they do not reclaim space so the 
B+Tree’s will continue to grow over time.
    > 
    > Reloading from scratch creates a smaller database because it is able to 
maximally pack the data into the Data structures on disk and you do not have 
any unused identifiers allocated.
    > 
    > Rob
    > 
    > On 21/08/2017 11:20, "Lorenzo Manzoni" <lmanz...@imolinfo.it> wrote:
    > 
    >    Hi,
    > 
    >        I'm writing you because we have a behavior of fuseki TDB  we can 
not 
    >    understand:
    > 
    >    */the fuseki database filesystem size continues to grow even if the 
    >    number of triples does not increase substantially./*
    > 
    >    We are using the latest version of fuseki (3.4.0) as triple store of a 
    >    semantic media wiki (mw 1.24, smw 2.1.1) and all the night we have a 
    >    scheduled job that updates the wiki pages and executes maintenance 
    >    scripts(e.g. 
    >    
https://www.semantic-mediawiki.org/wiki/Help:Maintenance_script_%22rebuildData.php%22)
 
    >    . These scripts update the semantic data on the wiki and the triples 
on 
    >    fuseki. Basically every triple are rewritten.
    > 
    >    We have observed that the fuseki database filesystem size grew over 
time 
    >    to 20Gb but when we recreate it from scratch the database size is only 
    >    500 Mb.
    > 
    >    After that every day  fuseki database grows about 200Mb and the number 
    >    of triples does not change substantially
    > 
    >    I originally assumed that the rebuild data script was the problem but 
    >    when I executed it alone the fuseki database space did not increase.
    > 
    >    We are running fueski on a 64 bit redhat machine.
    > 
    >    Someone can  help us?
    > 
    >    Thanks in advance,
    > 
    >    Lorenzo
    > 
    > 
    > 
    > 
    >

Re: Fuseki TDB database size growth

Reply via email to