Re: TDB optimization query

Amandeep Srivastava Wed, 13 Nov 2019 18:55:40 -0800

I was trying to test the performance of tdb.tdbloader2 by creating a TDB
database. The loader failed at sort SPO step. The failure seems to occur
because of insufficient storage in the /tmp folder. Can we point tdb to use
another folder as /tmp?


Error log:
sort: write failed: /tmp/sortxRql3B: No space left on device

On Wed, 13 Nov, 2019, 5:37 PM Amandeep Srivastava, <
[email protected]> wrote:

> Thanks, Andy, for the detailed explanation :)
>
> On Wed, 13 Nov, 2019, 4:52 PM Andy Seaborne, <[email protected]> wrote:
>
>>
>>
>> On 12/11/2019 15:53, Amandeep Srivastava wrote:
>> > Thanks for the heads up, Dan. Will go and check the archives.
>> >
>> > I think I should get how to decide between tdb and TDB2 in the archives
>> > itself.
>>
>> For large bulk loaders, the TDB2 loader is faster, if you use
>> --loader-parallel (NB it can take over your machine's I/O!)
>>
>> See tdb2.tdbloader --help for names of plans that are built-in.
>>
>> The only way to know which is best is to try but
>>
>>
>> The order threading used is:
>>
>> sequential < light < phased < parallel
>>
>> (it does not always mean more threads is faster).
>>
>> sequential is roughly the same as the TDB1 bulk loader.
>>
>> parallel usualy wins as data gets larger (several 100m) if the machine
>> has the I/O to handle it.
>>
>>      Andy
>>
>> >
>> > On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote:
>> >
>> >> Look through the list archives for posts from Andy describing the
>> >> differences between tdb1 and tdb2. they have different optimizations; I
>> >> don't recall the differences.
>> >>
>> >> thanks
>> >> danno
>> >>
>> >> Dan Pritts
>> >> ICPSR Computing and Network Services
>> >>
>> >> On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I'm trying to create a TDB database from Wikidata's official RDF dump
>> >>> to
>> >>> read the data using Fuseki service. I need to make a few queries for
>> >>> my
>> >>> personal project, running which the online service times out.
>> >>>
>> >>> I have a 12 core machine with 36 GB memory.
>> >>>
>> >>> Can you please advise on the best way for creating the database? Since
>> >>> the
>> >>> dump is huge, I cannot try all the approaches. Besides, I'm not sure
>> >>> if the
>> >>> tdbloader function works in a similar way on data of different sizes.
>> >>>
>> >>> Questions:
>> >>>
>> >>> 1. Which one would be better to use - tdb.tdbloader2 (TDB1) or
>> >>> tdb2.tdbloader (TDB2) for creating the database and why? Any specific
>> >>> configurations that I should be aware of?
>> >>>
>> >>> 2. I'm running a job currently using tdb.tdbloader2 but it is using
>> >>> just a
>> >>> single core. Also, it's loading speed is decreasing slowly. It started
>> >>> at
>> >>> an avg of 120k tuples and is currently at 80k tuples. Can you advise
>> >>> how
>> >>> can I utilize all the cores of my machine and maintain the loading
>> >>> speed at
>> >>> the same time?
>> >>>
>> >>> Regards,
>> >>> Aman
>> >>
>> >
>>
>

Re: TDB optimization query

Reply via email to