Andy, just one observation. there seems to be quite some data replication going on in the respective tdb / tdb2 folder.
Is it possibly to instruct tdb/tdb2 only to create a database with one default graph? It seems to be quite safe to remove files from disk that contain G-indexes manually and maintain query consistency in the default graph and it would reduced the tdb database footprint on disk by 1/3. not to speak of an option for LZW compression a la HDT. On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <a...@apache.org> wrote: > > > On 14/06/2019 18:13, Marco Neumann wrote: > > I am collecting jena loader benchmarks. if you have results please post > > them directly. > > > > http://www.lotico.com/index.php/JENA_Loader_Benchmarks > > tdb2.tdbloader has variations controlled by --loader. > > --loader= > Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or > 'light' > > "basic" is a super naive parser-add triple loop - it used if a loader > can't cope with an already loaded database. > > "phased" is a balanced, does not saturate the machine loader. Some > parallelism. > > "sequential" is the tdbloader algorithm for TDB2, more for reference. > > "parallel" is as much parallelism as it wants. (5 for triples, more for > quads) > > "light" is two threaded. Slightly ligther than "phased". > > See LoaderPlans. > > > On a linux machine I am using "time" to collect data. > > > > Is there a flag on tdb2.tdbloader to report time and triples per second? > > > > I have noticed that storage space use for tdbloader2 is significantly > > smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a > > straight forward explanation here? > > > -- --- Marco Neumann KONA