Hi Steven,

How are you runnign xloader? Default settings?

What's the storage being used?

On 22/05/2023 10:49, Steven Blanchard wrote:
Hello,

I am currently trying to load a very large dataset ( 54 billion triples) with the tdb2.xloader command.

The first two steps (Nodes and Terms) are completed with an average load speed of ~ 120,000.
The third stage (Data) has an average load speed of only 800.

is thet "Avg" is 800 from teh start of the phase or "the average drops to 800" during the phase?

This average load speed is incompatible with the amount of data to be loaded.

Looking at the status of the job, it is possible that there is an excessive demand on memory which slows down the process extremely.

We saw with a top that java required many memories :
```
top
# PID              USER   PR NI      VIRT       RES     SHR S %CPU %MEM      TIME+ COMMAND # 867362 sblanch+ 20  0 289,0g    90,2g  88,4g S       3,3      72,1 1102:32               java
```

xloader does not have much requirement for java heap memory.

That space may be mapped files.

But with a free -g, we see that it actually uses very little memory.
```
free -g
#             total used free shared buff/cache available
# Mem: 125          3     0             0                 121  120
```

Are there any possibilities to speed up this step?  (Give a -xms to java?)
Can this significant drop in loading speed for this step be due to memory usage? Do you know of any other limiting causes in this loading stage?

For previous insertions on smaller datasets, this Data step was not limiting and the average speed was even slightly higher than the Nodes and Terms steps.

How small is "smaller"?

That sounds like what I see when loading.


For information, the machine used has 32 CPUs and 128 Giga of Ram.

Thanks for your help,
Regards,

Steven



Reply via email to