Re: Very slow tdbloader2 insertion

2017-04-18 Thread Andy Seaborne
On 17/04/17 23:07, Laura Morales wrote: tdbloader2 builds b+trees from bottom to top, given sorted input. As such blocks are streamed to disk which is disk-efficient. It is a series of java programs scripted together by a shell script. tdbloader is pure java. It builds the b+trees by

Re: Very slow tdbloader2 insertion

2017-04-17 Thread Laura Morales
> tdbloader2 builds b+trees from bottom to top, given sorted input. As > such blocks are streamed to disk which is disk-efficient. > > It is a series of java programs scripted together by a shell script. > > tdbloader is pure java. It builds the b+trees by inserting, which for > some idndxes is

Re: Very slow tdbloader2 insertion

2017-04-17 Thread Andy Seaborne
tdbloader2 builds b+trees from bottom to top, given sorted input. As such blocks are streamed to disk which is disk-efficient. It is a series of java programs scripted together by a shell script. tdbloader is pure java. It builds the b+trees by inserting, which for some idndxes is not

Re: Very slow tdbloader2 insertion

2017-04-15 Thread A. Soroka
To start with, tdbloader2 uses the assumption that the tuples are sorted (actually, it sorts them, then uses that assumption) as described in this old blog post of Andy's: https://seaborne.blogspot.com/2010/12/repacking-btrees.html That's one reason that you only want to use tbdloader2 to

Re: Very slow tdbloader2 insertion

2017-04-15 Thread Laura Morales
> Use tdbloader for 10M quads. I wonder how is tdbloader technically different from tdbloader2. What makes tdbloader more suited for small/medium datasets and tdbloader2 more suited for very large datasets? Do they implement different insertion algorithms?

Re: Very slow tdbloader2 insertion

2017-04-15 Thread Andy Seaborne
Use tdbloader for 10M quads. As to why the load stage of tdbloder2 drops off, we'd need to know more about the environment you are running in. What is the machine? The disk? How much RAM does the machine have? Is there anything else running on the machine? Have you set the heap size or taken

Very slow tdbloader2 insertion

2017-04-15 Thread Laura Morales
I've made a dataset with about 10M nquads, 5-6 graphs, stored as a single .nq file. I've launched tdbloader2 to create a new dataset from this file, but I see a constant and remarkable slow down as more nquads are added to the dataset. Here are some INFO during processing: INFO Add: 50,000