Hey Paolo.
Thanks for your reply.
I used tdbloader2 with an own Tokenizer / Errorhandler (which just catches / skips errors and writes them into a file).
the command was /.tdbloader2 --loc=<store> <srcpath>/*

Is there a possibility to do incremental loads with the script files or do i have to write a own program?

Regards,
Stefan

Am 24.06.2012 10:42, schrieb Paolo Castagna:
Hi Stefan,
as Rob said, loading data into an empty TDB store is a different from loading
data into an existing TDB store.

I assume that for your second data load you used tdbloader not tdbloader2.

tdbloader2 does not even support incremental data loads (i.e. it will overwrite
your existing data). I suspect this is what is going on.

Can you share the exact commands you used as well as links to the RDF data?
(this way others can replicate your experiments).

Regards,
Paolo

Stefan Scheffler wrote:
Hello,
At the moment i am doing some performance checks on tdb. The first i
checked was the import of the tdbloader2 and i got some weird results.
Maybe someone can help me out. Here are my testbase and the results.

The first test was to store 12 GB of triples into an empty store (i used
the german dbpedia).

Load time: 16 minutes
average loading: ca 81.000 triple / second
index time: 40 minutes
store size: 9,3GB


The second test was to store the same data into an allready filled store
As i started the import i created a store with 348.398.593 Triples from
DNB and HBZ (which are german libraries, store size: 33 GB).
Then i started to load the german dbpedia in.

Load time: 3 hours and 4 minutes
average loading: ca 7200 / second
index time: 38 minutes
store size: 19 GB!!!!!

Why does the loading time increases that immense? My expectation was,
that the index time increases. But it does not. There where no other big
proccesses running nearby. And why does the store size shrink to 19GB? I
am totally confused about that point.

With friendly regards
Stefan



Reply via email to