On 17/04/17 23:07, Laura Morales wrote:
tdbloader2 builds b+trees from bottom to top, given sorted input. As
such blocks are streamed to disk which is disk-efficient.
It is a series of java programs scripted together by a shell script.
tdbloader is pure java. It builds the b+trees by
> tdbloader2 builds b+trees from bottom to top, given sorted input. As
> such blocks are streamed to disk which is disk-efficient.
>
> It is a series of java programs scripted together by a shell script.
>
> tdbloader is pure java. It builds the b+trees by inserting, which for
> some idndxes is
tdbloader2 builds b+trees from bottom to top, given sorted input. As
such blocks are streamed to disk which is disk-efficient.
It is a series of java programs scripted together by a shell script.
tdbloader is pure java. It builds the b+trees by inserting, which for
some idndxes is not
To start with, tdbloader2 uses the assumption that the tuples are sorted
(actually, it sorts them, then uses that assumption) as described in this old
blog post of Andy's:
https://seaborne.blogspot.com/2010/12/repacking-btrees.html
That's one reason that you only want to use tbdloader2 to
> Use tdbloader for 10M quads.
I wonder how is tdbloader technically different from tdbloader2. What makes
tdbloader more suited for small/medium datasets and tdbloader2 more suited for
very large datasets? Do they implement different insertion algorithms?
Use tdbloader for 10M quads.
As to why the load stage of tdbloder2 drops off, we'd need to know more
about the environment you are running in.
What is the machine? The disk?
How much RAM does the machine have?
Is there anything else running on the machine?
Have you set the heap size or taken
I've made a dataset with about 10M nquads, 5-6 graphs, stored as a single .nq
file.
I've launched tdbloader2 to create a new dataset from this file, but I see a
constant and remarkable slow down as more nquads are added to the dataset. Here
are some INFO during processing:
INFO Add: 50,000