Hi,

I am loading an n-triples file using tdbloader2.  I am curious about
what is the meaning of the numbers in the loader output.  The loading
output started as follows:

 09:54:15 INFO -- TDB Bulk Loader Start
 09:54:15 INFO Data Load Phase
 09:54:15 INFO Got 1 data files to load
 09:54:15 INFO Data file 1: /home/ubuntu/dataset.nq.gz
09:54:59 INFO  loader          :: Load: /home/ubuntu/dataset.nq.gz -- 
2021/02/08 09:54:59 UTC
09:55:01 INFO  loader          :: Add: 50,000 Data (Batch: 19,912 / Avg: 19,912)
09:55:03 INFO  loader          :: Add: 100,000 Data (Batch: 23,288 / Avg: 
21,468)
09:55:05 INFO  loader          :: Add: 150,000 Data (Batch: 26,123 / Avg: 
22,824)
09:55:07 INFO  loader          :: Add: 200,000 Data (Batch: 24,987 / Avg: 
23,329)
09:55:09 INFO  loader          :: Add: 250,000 Data (Batch: 25,641 / Avg: 
23,757)
09:55:11 INFO  loader          :: Add: 300,000 Data (Batch: 25,100 / Avg: 
23,971)
09:55:13 INFO  loader          :: Add: 350,000 Data (Batch: 24,213 / Avg: 
24,005)
09:55:15 INFO  loader          :: Add: 400,000 Data (Batch: 24,461 / Avg: 
24,061)
09:55:17 INFO  loader          :: Add: 450,000 Data (Batch: 25,667 / Avg: 
24,230)
09:55:19 INFO  loader          :: Add: 500,000 Data (Batch: 25,879 / Avg: 
24,385)
09:55:19 INFO  loader          ::   Elapsed: 20.50 seconds [2021/02/08 09:55:19 
UTC]
09:55:21 INFO  loader          :: Add: 550,000 Data (Batch: 25,484 / Avg: 
24,481)
09:55:23 INFO  loader          :: Add: 600,000 Data (Batch: 23,419 / Avg: 
24,389)
09:55:25 INFO  loader          :: Add: 650,000 Data (Batch: 25,012 / Avg: 
24,436)
09:55:27 INFO  loader          :: Add: 700,000 Data (Batch: 25,201 / Avg: 
24,489)
09:55:29 INFO  loader          :: Add: 750,000 Data (Batch: 26,288 / Avg: 
24,601)
09:55:31 INFO  loader          :: Add: 800,000 Data (Batch: 25,960 / Avg: 
24,682)
09:55:33 INFO  loader          :: Add: 850,000 Data (Batch: 24,330 / Avg: 
24,661)
09:55:35 INFO  loader          :: Add: 900,000 Data (Batch: 25,813 / Avg: 
24,722)
09:55:37 INFO  loader          :: Add: 950,000 Data (Batch: 26,164 / Avg: 
24,794)
09:55:39 INFO  loader          :: Add: 1,000,000 Data (Batch: 26,357 / Avg: 
24,868)
09:55:39 INFO  loader          ::   Elapsed: 40.21 seconds [2021/02/08 09:55:39 
UTC]

My first questions are:

1) I guess that 600,000 is the number of data loaded at 09:55:23.  What
means data?  Does it mean bytes or triples?

2) What are the numbers Batch: 23,419 and Avg: 24,389?  I guess that are
associated to the loading speed.

After some days of loading the output shows different numbers:

10:21:45 INFO  loader          ::   Elapsed: 433,606.84 seconds [2021/02/13 
10:21:45 UTC]
10:21:48 INFO  loader          :: Add: 505,550,000 Data (Batch: 18,348 / Avg: 
1,165)
10:21:51 INFO  loader          :: Add: 505,600,000 Data (Batch: 18,656 / Avg: 
1,166)
10:22:55 INFO  loader          :: Add: 505,650,000 Data (Batch: 781 / Avg: 
1,165)
10:36:12 INFO  loader          :: Add: 505,700,000 Data (Batch: 62 / Avg: 1,163)
10:36:14 INFO  loader          :: Add: 505,750,000 Data (Batch: 17,543 / Avg: 
1,164)
10:36:17 INFO  loader          :: Add: 505,800,000 Data (Batch: 17,385 / Avg: 
1,164)
10:36:20 INFO  loader          :: Add: 505,850,000 Data (Batch: 17,998 / Avg: 
1,164)
10:36:23 INFO  loader          :: Add: 505,900,000 Data (Batch: 17,170 / Avg: 
1,164)
10:37:12 INFO  loader          :: Add: 505,950,000 Data (Batch: 1,025 / Avg: 
1,164)
10:37:14 INFO  loader          :: Add: 506,000,000 Data (Batch: 18,301 / Avg: 
1,164)
10:37:14 INFO  loader          ::   Elapsed: 434,535.94 seconds [2021/02/13 
10:37:14 UTC]

Now the numbers Batch and Avg are smaller.  Also, it is taking longer to
load each 500,000 data.  At the beginning it takes 20 seconds to load
500,000 data.  Now it is taking 929 seconds.  Why the load speed is
degraded?  In my experience loading big datasets in Jena I always see
that loading start getting slower as much data have been already loaded.

Best regards,
Daniel

Reply via email to