I was trying to test the performance of tdb.tdbloader2 by creating a TDB database. The loader failed at sort SPO step. The failure seems to occur because of insufficient storage in the /tmp folder. Can we point tdb to use another folder as /tmp?
Error log: sort: write failed: /tmp/sortxRql3B: No space left on device On Wed, 13 Nov, 2019, 5:37 PM Amandeep Srivastava, < [email protected]> wrote: > Thanks, Andy, for the detailed explanation :) > > On Wed, 13 Nov, 2019, 4:52 PM Andy Seaborne, <[email protected]> wrote: > >> >> >> On 12/11/2019 15:53, Amandeep Srivastava wrote: >> > Thanks for the heads up, Dan. Will go and check the archives. >> > >> > I think I should get how to decide between tdb and TDB2 in the archives >> > itself. >> >> For large bulk loaders, the TDB2 loader is faster, if you use >> --loader-parallel (NB it can take over your machine's I/O!) >> >> See tdb2.tdbloader --help for names of plans that are built-in. >> >> The only way to know which is best is to try but >> >> >> The order threading used is: >> >> sequential < light < phased < parallel >> >> (it does not always mean more threads is faster). >> >> sequential is roughly the same as the TDB1 bulk loader. >> >> parallel usualy wins as data gets larger (several 100m) if the machine >> has the I/O to handle it. >> >> Andy >> >> > >> > On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote: >> > >> >> Look through the list archives for posts from Andy describing the >> >> differences between tdb1 and tdb2. they have different optimizations; I >> >> don't recall the differences. >> >> >> >> thanks >> >> danno >> >> >> >> Dan Pritts >> >> ICPSR Computing and Network Services >> >> >> >> On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote: >> >> >> >>> Hi, >> >>> >> >>> I'm trying to create a TDB database from Wikidata's official RDF dump >> >>> to >> >>> read the data using Fuseki service. I need to make a few queries for >> >>> my >> >>> personal project, running which the online service times out. >> >>> >> >>> I have a 12 core machine with 36 GB memory. >> >>> >> >>> Can you please advise on the best way for creating the database? Since >> >>> the >> >>> dump is huge, I cannot try all the approaches. Besides, I'm not sure >> >>> if the >> >>> tdbloader function works in a similar way on data of different sizes. >> >>> >> >>> Questions: >> >>> >> >>> 1. Which one would be better to use - tdb.tdbloader2 (TDB1) or >> >>> tdb2.tdbloader (TDB2) for creating the database and why? Any specific >> >>> configurations that I should be aware of? >> >>> >> >>> 2. I'm running a job currently using tdb.tdbloader2 but it is using >> >>> just a >> >>> single core. Also, it's loading speed is decreasing slowly. It started >> >>> at >> >>> an avg of 120k tuples and is currently at 80k tuples. Can you advise >> >>> how >> >>> can I utilize all the cores of my machine and maintain the loading >> >>> speed at >> >>> the same time? >> >>> >> >>> Regards, >> >>> Aman >> >> >> > >> >
