Re: TDB optimization query

Amandeep Srivastava Wed, 13 Nov 2019 04:08:53 -0800

Thanks, Andy, for the detailed explanation :)

On Wed, 13 Nov, 2019, 4:52 PM Andy Seaborne, <[email protected]> wrote:


>
>
> On 12/11/2019 15:53, Amandeep Srivastava wrote:
> > Thanks for the heads up, Dan. Will go and check the archives.
> >
> > I think I should get how to decide between tdb and TDB2 in the archives
> > itself.
>
> For large bulk loaders, the TDB2 loader is faster, if you use
> --loader-parallel (NB it can take over your machine's I/O!)
>
> See tdb2.tdbloader --help for names of plans that are built-in.
>
> The only way to know which is best is to try but
>
>
> The order threading used is:
>
> sequential < light < phased < parallel
>
> (it does not always mean more threads is faster).
>
> sequential is roughly the same as the TDB1 bulk loader.
>
> parallel usualy wins as data gets larger (several 100m) if the machine
> has the I/O to handle it.
>
>      Andy
>
> >
> > On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote:
> >
> >> Look through the list archives for posts from Andy describing the
> >> differences between tdb1 and tdb2. they have different optimizations; I
> >> don't recall the differences.
> >>
> >> thanks
> >> danno
> >>
> >> Dan Pritts
> >> ICPSR Computing and Network Services
> >>
> >> On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm trying to create a TDB database from Wikidata's official RDF dump
> >>> to
> >>> read the data using Fuseki service. I need to make a few queries for
> >>> my
> >>> personal project, running which the online service times out.
> >>>
> >>> I have a 12 core machine with 36 GB memory.
> >>>
> >>> Can you please advise on the best way for creating the database? Since
> >>> the
> >>> dump is huge, I cannot try all the approaches. Besides, I'm not sure
> >>> if the
> >>> tdbloader function works in a similar way on data of different sizes.
> >>>
> >>> Questions:
> >>>
> >>> 1. Which one would be better to use - tdb.tdbloader2 (TDB1) or
> >>> tdb2.tdbloader (TDB2) for creating the database and why? Any specific
> >>> configurations that I should be aware of?
> >>>
> >>> 2. I'm running a job currently using tdb.tdbloader2 but it is using
> >>> just a
> >>> single core. Also, it's loading speed is decreasing slowly. It started
> >>> at
> >>> an avg of 120k tuples and is currently at 80k tuples. Can you advise
> >>> how
> >>> can I utilize all the cores of my machine and maintain the loading
> >>> speed at
> >>> the same time?
> >>>
> >>> Regards,
> >>> Aman
> >>
> >
>

Re: TDB optimization query

Reply via email to