On 12/11/2019 15:53, Amandeep Srivastava wrote:
Thanks for the heads up, Dan. Will go and check the archives.
I think I should get how to decide between tdb and TDB2 in the archives
itself.
For large bulk loaders, the TDB2 loader is faster, if you use
--loader-parallel (NB it can take over your machine's I/O!)
See tdb2.tdbloader --help for names of plans that are built-in.
The only way to know which is best is to try but
The order threading used is:
sequential < light < phased < parallel
(it does not always mean more threads is faster).
sequential is roughly the same as the TDB1 bulk loader.
parallel usualy wins as data gets larger (several 100m) if the machine
has the I/O to handle it.
Andy
On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote:
Look through the list archives for posts from Andy describing the
differences between tdb1 and tdb2. they have different optimizations; I
don't recall the differences.
thanks
danno
Dan Pritts
ICPSR Computing and Network Services
On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote:
Hi,
I'm trying to create a TDB database from Wikidata's official RDF dump
to
read the data using Fuseki service. I need to make a few queries for
my
personal project, running which the online service times out.
I have a 12 core machine with 36 GB memory.
Can you please advise on the best way for creating the database? Since
the
dump is huge, I cannot try all the approaches. Besides, I'm not sure
if the
tdbloader function works in a similar way on data of different sizes.
Questions:
1. Which one would be better to use - tdb.tdbloader2 (TDB1) or
tdb2.tdbloader (TDB2) for creating the database and why? Any specific
configurations that I should be aware of?
2. I'm running a job currently using tdb.tdbloader2 but it is using
just a
single core. Also, it's loading speed is decreasing slowly. It started
at
an avg of 120k tuples and is currently at 80k tuples. Can you advise
how
can I utilize all the cores of my machine and maintain the loading
speed at
the same time?
Regards,
Aman