That's usually what I see done in the literature

Accounting for the exact amount of disk usage it's difficult for a number of 
reasons:

- Terms are dictionary encoded, so each URI, literal and blank node identifier 
is stored only once and mapped to an internal constant size identifier (64 bits 
for TBD1). So however many times a term is used its storage is its encoded size 
plus N times the identifier size. So how "shared" disk usage contributes to an 
individual graph is subject to interpretation
- Similarly there is no reference counting for terms. So if data is deleted 
from a graph some of the disk usage is never reclaimed, and there is no way to 
track this. On the other hand if you want to know how many times a given term 
is used you need to query the database to find that out.
- Index size will vary depending upon the data, including how it was loaded and 
how many updates have happened. For example tdbloader2 will produce maximally 
packed indices but as soon as you start running updates the indexes will expand 
as the B+Trees get rebalanced. And again how do you account for the overhead of 
the on disk idnex data structures?

One "hack" might be to export the graph in question, import it into a separate 
TDB instance and get the disk size of that. However as explained above you 
would end up over estimating to some extent.

Rob

On 04/06/2018, 13:18, "Mikael Pesonen" <mikael.peso...@lingsoft.fi> wrote:

    
    Hi,
    
    what would be best way to estimate how much disk space (bytes) a single 
    graph is using in Fuseki?
    
    Only option that came to mind is to get entire db disk usage with Linux 
    system call and take the same proportion as there are triplets in the 
    graph vs in all graphs. That would be a rough estimate.
    
    Thank you
    
    -- 
    Lingsoft - 30 years of Leading Language Management
    
    www.lingsoft.fi
    
    Speech Applications - Language Management - Translation - Reader's and 
Writer's Tools - Text Tools - E-books and M-books
    
    Mikael Pesonen
    System Engineer
    
    e-mail: mikael.peso...@lingsoft.fi
    Tel. +358 2 279 3300
    
    Time zone: GMT+2
    
    Helsinki Office
    Eteläranta 10
    FI-00130 Helsinki
    FINLAND
    
    Turku Office
    Kauppiaskatu 5 A
    FI-20100 Turku
    FINLAND
    
    




Reply via email to