Re: JENA Loader Benchmarks

Marco Neumann Tue, 18 Jun 2019 05:45:29 -0700

Andy, just one observation. there seems to be quite some data replication
going on in the respective tdb / tdb2 folder.


Is it possibly to instruct tdb/tdb2 only to create a database with one
default graph? It seems to be quite safe to remove files from disk that
contain G-indexes manually and maintain query consistency in the default
graph and it would reduced the tdb database footprint on disk by 1/3.

not to speak of an option for LZW compression a la HDT.



On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <a...@apache.org> wrote:

>
>
> On 14/06/2019 18:13, Marco Neumann wrote:
> > I am collecting jena loader benchmarks. if you have results please post
> > them directly.
> >
> > http://www.lotico.com/index.php/JENA_Loader_Benchmarks
>
> tdb2.tdbloader has variations controlled by --loader.
>
> --loader=
> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> 'light'
>
> "basic" is a super naive parser-add triple loop - it used if a loader
> can't cope with an already loaded database.
>
> "phased" is a balanced, does not saturate the machine loader. Some
> parallelism.
>
> "sequential" is the tdbloader algorithm for TDB2, more for reference.
>
> "parallel" is as much parallelism as it wants. (5 for triples, more for
> quads)
>
> "light" is two threaded. Slightly ligther than "phased".
>
> See LoaderPlans.
>
> > On a linux machine I am using "time" to collect data.
> >
> > Is there a flag on tdb2.tdbloader to report time and triples per second?
> >
> > I have noticed that storage space use for tdbloader2 is significantly
> > smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> > straight forward explanation here?
> >
>


-- 


---
Marco Neumann
KONA

Re: JENA Loader Benchmarks

Reply via email to