Re: Very slow tdbloader2 insertion

Andy Seaborne Mon, 17 Apr 2017 14:23:23 -0700

tdbloader2 builds b+trees from bottom to top, given sorted input. Assuch blocks are streamed to disk which is disk-efficient.


It is a series of java programs scripted together by a shell script.

tdbloader is pure java. It builds the b+trees by inserting, which forsome idndxes is not optimal because it causes random inserts leading torandom I/O, which is bad for disk performance.


    Andy



On 15/04/17 22:13, A. Soroka wrote:

To start with, tdbloader2 uses the assumption that the tuples are sorted 
(actually, it sorts them, then uses that assumption) as described in this old 
blog post of Andy's:

https://seaborne.blogspot.com/2010/12/repacking-btrees.html

That's one reason that you only want to use tbdloader2 to start from scratch. 
Andy, of course, can say more.

---
A. Soroka
The University of Virginia Library

On Apr 15, 2017, at 2:58 PM, Laura Morales <laure...@mail.com> wrote:

Use tdbloader for 10M quads.


I wonder how is tdbloader technically different from tdbloader2. What makes 
tdbloader more suited for small/medium datasets and tdbloader2 more suited for 
very large datasets? Do they implement different insertion algorithms?

Re: Very slow tdbloader2 insertion

Reply via email to