that did the trick Andy, very good might be a good idea to add this to the distribution in jena-log4j.properties
I am getting these numbers for a midsize dedicated server, very nice numbers indeed Andy. well done! 00:24:53 INFO loader :: Loader = LoaderPhased 00:24:53 INFO loader :: Start: ../../public_html/lotico.ttl.gz 00:24:55 INFO loader :: Add: 500,000 lotico.ttl.gz (Batch: 237,755 / Avg: 237,755) 00:24:56 INFO loader :: Add: 1,000,000 lotico.ttl.gz (Batch: 305,250 / Avg: 267,308) 00:24:58 INFO loader :: Add: 1,500,000 lotico.ttl.gz (Batch: 313,087 / Avg: 281,004) 00:25:00 INFO loader :: Add: 2,000,000 lotico.ttl.gz (Batch: 328,299 / Avg: 291,502) 00:25:01 INFO loader :: Add: 2,500,000 lotico.ttl.gz (Batch: 341,763 / Avg: 300,336) 00:25:03 INFO loader :: Add: 3,000,000 lotico.ttl.gz (Batch: 337,381 / Avg: 305,935) 00:25:04 INFO loader :: Add: 3,500,000 lotico.ttl.gz (Batch: 318,877 / Avg: 307,719) 00:25:06 INFO loader :: Add: 4,000,000 lotico.ttl.gz (Batch: 295,857 / Avg: 306,184) 00:25:07 INFO loader :: Add: 4,500,000 lotico.ttl.gz (Batch: 327,225 / Avg: 308,388) 00:25:09 INFO loader :: Add: 5,000,000 lotico.ttl.gz (Batch: 349,406 / Avg: 312,051) 00:25:09 INFO loader :: Elapsed: 16.02 seconds [2019/06/15 00:25:09 CEST] 00:25:11 INFO loader :: Add: 5,500,000 lotico.ttl.gz (Batch: 285,062 / Avg: 309,388) 00:25:13 INFO loader :: Add: 6,000,000 lotico.ttl.gz (Batch: 203,665 / Avg: 296,559) 00:25:16 INFO loader :: Add: 6,500,000 lotico.ttl.gz (Batch: 189,393 / Avg: 284,190) on another machine that sits in the Azure infrastructure somewhere it tdbloader doesn't look as good, even with decent hardware it seems to die a slow death of memory exhaustion at 16GB. started off with 70kT/s and is now down to 17kT/s and still going. lesson learned big iron and big memory is the way to go with Jena tdbloaders. On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne <a...@apache.org> wrote: > These messages are logged (to logger "org.apache.jena.tdb2.loader") - do > you have log4j.proprties in the current working directory? > > Do you get any output? > > INFO Loader = LoaderParallel > INFO Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz > INFO Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770) > INFO Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604) > INFO Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920) > INFO Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189) > INFO Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508) > INFO Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173) > INFO Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804) > INFO Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676) > INFO Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081) > INFO Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076) > INFO Elapsed: 28.56 seconds [2019/06/14 22:51:37 BST] > INFO Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples > in 28.63s (Avg: 174,644) > INFO Finish - index SPO > INFO Finish - index POS > INFO Finish - index OSP > INFO Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s > > > There is pause after the first "Finished:" - this is finished data in, > the index threads are still running and the pause comes from flush to disk. > > Andy > > On 14/06/2019 20:16, Marco Neumann wrote: > > let me fire up one of the big machines to see what I will get there. > > currently I have no info display during load with tdb2.tdbloader . if -v > is > > specified I get some extra info but no load info. > > > > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne <a...@apache.org> wrote: > > > >> > >> > >> On 14/06/2019 18:13, Marco Neumann wrote: > >>> I am collecting jena loader benchmarks. if you have results please post > >>> them directly. > >>> > >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks > >> > >> tdb2.tdbloader has variations controlled by --loader. > >> > >> --loader= > >> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or > >> 'light' > >> > >> "basic" is a super naive parser-add triple loop - it used if a loader > >> can't cope with an already loaded database. > >> > >> "phased" is a balanced, does not saturate the machine loader. Some > >> parallelism. > >> > >> "sequential" is the tdbloader algorithm for TDB2, more for reference. > >> > >> "parallel" is as much parallelism as it wants. (5 for triples, more for > >> quads) > >> > >> "light" is two threaded. Slightly ligther than "phased". > >> > >> See LoaderPlans. > >> > >>> On a linux machine I am using "time" to collect data. > >>> > >>> Is there a flag on tdb2.tdbloader to report time and triples per > second? > >>> > >>> I have noticed that storage space use for tdbloader2 is significantly > >>> smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a > >>> straight forward explanation here? > >>> > >> > > > > > -- --- Marco Neumann KONA