TDB is offering indexes in multiple orders to support fast query. HDT [1] 
appears to store one order and add additional information when loaded into main 
memory. That might be part of it.

ajs6f

[1] http://www.rdfhdt.org/hdt-internals/

> On Nov 26, 2017, at 7:56 AM, Laura Morales <laure...@mail.com> wrote:
> 
> I wonder... if TDB like HDT uses integers instead of strings, why is there 
> such a difference in the store size? HDT files are so much smaller.
>  
>  
> 
> Sent: Sunday, November 26, 2017 at 1:30 PM
> From: "Andy Seaborne" <a...@apache.org>
> To: users@jena.apache.org
> Subject: Re: Estimating TDB2 size
> Every RDFTerm gets a NodeId in TDB. A triple is 3 NodeIds.
> 
> There is a big cache, NodeId->RDFTerm.
> 
> In TDB1 and TDB2, a NodeId is stored as 8 bytes. TDB2 design is an int
> and long (96 bits) - the current implementation is using 64 bits.
> 
> It is very common as a design to dictionary (intern) terms because joins
> can be done by comparing a integers, not testing whether two strings are
> the same, which is much more expensive.
> 
> In addition TDBx inlines numbers integers and date/times and some others.
> 
> https://jena.apache.org/documentation/tdb/architecture.html
> 
> TDBx could, but doesn't, store compressed data on disk. There are pros
> and cons of this.
> 
> Andy

Reply via email to