Thanks, Andy, for the detailed explanation :) On Wed, 13 Nov, 2019, 4:52 PM Andy Seaborne, <[email protected]> wrote:
> > > On 12/11/2019 15:53, Amandeep Srivastava wrote: > > Thanks for the heads up, Dan. Will go and check the archives. > > > > I think I should get how to decide between tdb and TDB2 in the archives > > itself. > > For large bulk loaders, the TDB2 loader is faster, if you use > --loader-parallel (NB it can take over your machine's I/O!) > > See tdb2.tdbloader --help for names of plans that are built-in. > > The only way to know which is best is to try but > > > The order threading used is: > > sequential < light < phased < parallel > > (it does not always mean more threads is faster). > > sequential is roughly the same as the TDB1 bulk loader. > > parallel usualy wins as data gets larger (several 100m) if the machine > has the I/O to handle it. > > Andy > > > > > On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote: > > > >> Look through the list archives for posts from Andy describing the > >> differences between tdb1 and tdb2. they have different optimizations; I > >> don't recall the differences. > >> > >> thanks > >> danno > >> > >> Dan Pritts > >> ICPSR Computing and Network Services > >> > >> On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote: > >> > >>> Hi, > >>> > >>> I'm trying to create a TDB database from Wikidata's official RDF dump > >>> to > >>> read the data using Fuseki service. I need to make a few queries for > >>> my > >>> personal project, running which the online service times out. > >>> > >>> I have a 12 core machine with 36 GB memory. > >>> > >>> Can you please advise on the best way for creating the database? Since > >>> the > >>> dump is huge, I cannot try all the approaches. Besides, I'm not sure > >>> if the > >>> tdbloader function works in a similar way on data of different sizes. > >>> > >>> Questions: > >>> > >>> 1. Which one would be better to use - tdb.tdbloader2 (TDB1) or > >>> tdb2.tdbloader (TDB2) for creating the database and why? Any specific > >>> configurations that I should be aware of? > >>> > >>> 2. I'm running a job currently using tdb.tdbloader2 but it is using > >>> just a > >>> single core. Also, it's loading speed is decreasing slowly. It started > >>> at > >>> an avg of 120k tuples and is currently at 80k tuples. Can you advise > >>> how > >>> can I utilize all the cores of my machine and maintain the loading > >>> speed at > >>> the same time? > >>> > >>> Regards, > >>> Aman > >> > > >
