tdbloader2 builds b+trees from bottom to top, given sorted input. As such blocks are streamed to disk which is disk-efficient.

It is a series of java programs scripted together by a shell script.

tdbloader is pure java. It builds the b+trees by inserting, which for some idndxes is not optimal because it causes random inserts leading to random I/O, which is bad for disk performance.

    Andy



On 15/04/17 22:13, A. Soroka wrote:
To start with, tdbloader2 uses the assumption that the tuples are sorted 
(actually, it sorts them, then uses that assumption) as described in this old 
blog post of Andy's:

https://seaborne.blogspot.com/2010/12/repacking-btrees.html

That's one reason that you only want to use tbdloader2 to start from scratch. 
Andy, of course, can say more.

---
A. Soroka
The University of Virginia Library

On Apr 15, 2017, at 2:58 PM, Laura Morales <laure...@mail.com> wrote:

Use tdbloader for 10M quads.

I wonder how is tdbloader technically different from tdbloader2. What makes 
tdbloader more suited for small/medium datasets and tdbloader2 more suited for 
very large datasets? Do they implement different insertion algorithms?

Reply via email to