Hi,
I am loading an n-triples file using tdbloader2. I am curious about what is the meaning of the numbers in the loader output. The loading output started as follows: 09:54:15 INFO -- TDB Bulk Loader Start 09:54:15 INFO Data Load Phase 09:54:15 INFO Got 1 data files to load 09:54:15 INFO Data file 1: /home/ubuntu/dataset.nq.gz 09:54:59 INFO loader :: Load: /home/ubuntu/dataset.nq.gz -- 2021/02/08 09:54:59 UTC 09:55:01 INFO loader :: Add: 50,000 Data (Batch: 19,912 / Avg: 19,912) 09:55:03 INFO loader :: Add: 100,000 Data (Batch: 23,288 / Avg: 21,468) 09:55:05 INFO loader :: Add: 150,000 Data (Batch: 26,123 / Avg: 22,824) 09:55:07 INFO loader :: Add: 200,000 Data (Batch: 24,987 / Avg: 23,329) 09:55:09 INFO loader :: Add: 250,000 Data (Batch: 25,641 / Avg: 23,757) 09:55:11 INFO loader :: Add: 300,000 Data (Batch: 25,100 / Avg: 23,971) 09:55:13 INFO loader :: Add: 350,000 Data (Batch: 24,213 / Avg: 24,005) 09:55:15 INFO loader :: Add: 400,000 Data (Batch: 24,461 / Avg: 24,061) 09:55:17 INFO loader :: Add: 450,000 Data (Batch: 25,667 / Avg: 24,230) 09:55:19 INFO loader :: Add: 500,000 Data (Batch: 25,879 / Avg: 24,385) 09:55:19 INFO loader :: Elapsed: 20.50 seconds [2021/02/08 09:55:19 UTC] 09:55:21 INFO loader :: Add: 550,000 Data (Batch: 25,484 / Avg: 24,481) 09:55:23 INFO loader :: Add: 600,000 Data (Batch: 23,419 / Avg: 24,389) 09:55:25 INFO loader :: Add: 650,000 Data (Batch: 25,012 / Avg: 24,436) 09:55:27 INFO loader :: Add: 700,000 Data (Batch: 25,201 / Avg: 24,489) 09:55:29 INFO loader :: Add: 750,000 Data (Batch: 26,288 / Avg: 24,601) 09:55:31 INFO loader :: Add: 800,000 Data (Batch: 25,960 / Avg: 24,682) 09:55:33 INFO loader :: Add: 850,000 Data (Batch: 24,330 / Avg: 24,661) 09:55:35 INFO loader :: Add: 900,000 Data (Batch: 25,813 / Avg: 24,722) 09:55:37 INFO loader :: Add: 950,000 Data (Batch: 26,164 / Avg: 24,794) 09:55:39 INFO loader :: Add: 1,000,000 Data (Batch: 26,357 / Avg: 24,868) 09:55:39 INFO loader :: Elapsed: 40.21 seconds [2021/02/08 09:55:39 UTC] My first questions are: 1) I guess that 600,000 is the number of data loaded at 09:55:23. What means data? Does it mean bytes or triples? 2) What are the numbers Batch: 23,419 and Avg: 24,389? I guess that are associated to the loading speed. After some days of loading the output shows different numbers: 10:21:45 INFO loader :: Elapsed: 433,606.84 seconds [2021/02/13 10:21:45 UTC] 10:21:48 INFO loader :: Add: 505,550,000 Data (Batch: 18,348 / Avg: 1,165) 10:21:51 INFO loader :: Add: 505,600,000 Data (Batch: 18,656 / Avg: 1,166) 10:22:55 INFO loader :: Add: 505,650,000 Data (Batch: 781 / Avg: 1,165) 10:36:12 INFO loader :: Add: 505,700,000 Data (Batch: 62 / Avg: 1,163) 10:36:14 INFO loader :: Add: 505,750,000 Data (Batch: 17,543 / Avg: 1,164) 10:36:17 INFO loader :: Add: 505,800,000 Data (Batch: 17,385 / Avg: 1,164) 10:36:20 INFO loader :: Add: 505,850,000 Data (Batch: 17,998 / Avg: 1,164) 10:36:23 INFO loader :: Add: 505,900,000 Data (Batch: 17,170 / Avg: 1,164) 10:37:12 INFO loader :: Add: 505,950,000 Data (Batch: 1,025 / Avg: 1,164) 10:37:14 INFO loader :: Add: 506,000,000 Data (Batch: 18,301 / Avg: 1,164) 10:37:14 INFO loader :: Elapsed: 434,535.94 seconds [2021/02/13 10:37:14 UTC] Now the numbers Batch and Avg are smaller. Also, it is taking longer to load each 500,000 data. At the beginning it takes 20 seconds to load 500,000 data. Now it is taking 929 seconds. Why the load speed is degraded? In my experience loading big datasets in Jena I always see that loading start getting slower as much data have been already loaded. Best regards, Daniel
