I recently had a need to load ~225M triples into a TDB triplestore, and when 
allocating only ~12G to the triple loader, I experienced the very same 
slowdowns you described. As an alternative, I just reserved an on-demand, high 
memory (i.e. ~60GB) instance in the public cloud, and the processing completed 
in only a few hours. I then just moved the files onto my local server and 
proceeded from there.

Aaron Coburn


On Feb 25, 2013, at 1:25 PM, Andy Seaborne <[email protected]> wrote:

> On 25/02/13 20:07, Joshua Greben wrote:
>> Hello All,
>> 
>> I am new to this list and to Jena and was wondering if anyone could
>> offer advice for loading a large triplestore.
>> 
>> I am trying to load 670M Ntriples into a store using tdbloader on a
>> single machine with 64-bit hardware and 8GB of memory. However, I am
>> running into a massive slowdown. When the load starts the tdbloader
>> is processing around 30K tps but by the time it has loaded 130M
>> triples it can essentially no longer load any more and slows down to
>> 2300 tps. At that point I have to kill the process because it will
>> basically never finish.
>> 
>> Is 8GB of memory enough or is there a more efficient way to load this
>> data? I am trying to load the data into a single DB location. Should
>> I be splitting up the triples and loading them into different DBs?
>> 
>> Advice from anyone who has experience successfully loading a large
>> triplestore is much appreciated.
> 
> Only 8G is pushing it somewhat for 670M triples.  It will finish; it will 
> take a very long time.  Faster loads have been reported by using a larger 
> machine (e.g. Freebase in 8 hours on a IBM Power7 and 48G RAM).
> 
> tdbloader2 (Linux only) may get you there a bit quicker but really you need a 
> bigger machine.
> 
> Once built, you can copy the dataset as files to other machines.
> 
>       Andy
> 
>> 
>> Thanks!
>> 
>> - Josh
>> 
>> 
>> 
>> Joshua Greben Library Systems Programmer & Analyst Stanford
>> University Libraries (650) 714-1937 [email protected]
>> 
>> 
>> 
> 

Reply via email to