I recently had a need to load ~225M triples into a TDB triplestore, and when allocating only ~12G to the triple loader, I experienced the very same slowdowns you described. As an alternative, I just reserved an on-demand, high memory (i.e. ~60GB) instance in the public cloud, and the processing completed in only a few hours. I then just moved the files onto my local server and proceeded from there.
Aaron Coburn On Feb 25, 2013, at 1:25 PM, Andy Seaborne <[email protected]> wrote: > On 25/02/13 20:07, Joshua Greben wrote: >> Hello All, >> >> I am new to this list and to Jena and was wondering if anyone could >> offer advice for loading a large triplestore. >> >> I am trying to load 670M Ntriples into a store using tdbloader on a >> single machine with 64-bit hardware and 8GB of memory. However, I am >> running into a massive slowdown. When the load starts the tdbloader >> is processing around 30K tps but by the time it has loaded 130M >> triples it can essentially no longer load any more and slows down to >> 2300 tps. At that point I have to kill the process because it will >> basically never finish. >> >> Is 8GB of memory enough or is there a more efficient way to load this >> data? I am trying to load the data into a single DB location. Should >> I be splitting up the triples and loading them into different DBs? >> >> Advice from anyone who has experience successfully loading a large >> triplestore is much appreciated. > > Only 8G is pushing it somewhat for 670M triples. It will finish; it will > take a very long time. Faster loads have been reported by using a larger > machine (e.g. Freebase in 8 hours on a IBM Power7 and 48G RAM). > > tdbloader2 (Linux only) may get you there a bit quicker but really you need a > bigger machine. > > Once built, you can copy the dataset as files to other machines. > > Andy > >> >> Thanks! >> >> - Josh >> >> >> >> Joshua Greben Library Systems Programmer & Analyst Stanford >> University Libraries (650) 714-1937 [email protected] >> >> >> >
