Couple of possibilities:

1) Get something other than RDF/XML from Gutenberg. I don't mean that to sound flippant. They may very well maintain some other representation (NTriples, Turtle, etc) for their own use and they might be willing to share it. It's worth an email. Then use SOH.

2A) Convert your stuff to a single NTriples (streamable) file and load it into a TDB database locally, then put it on the server. You can use riot to do this (it can accept more than one filename) but with that many files, you may need to do it in several stages or groups, or use xargs or the like. This may or may not work for you, depending on whether you have access to the server to install a TDB database directly into Fuseki, or only via HTTP.

2B) Convert your stuff to a single NTriples (streamable) file using riot and 
load it via SOH.

ajs6f

Andrew U. Frank wrote on 10/7/17 10:17 AM:
i have to load the Gutenberg projects catalog in rdf/xml format. this is a 
collection of about 50,000 files, each
containing a single record as attached.

if i try to concatenate these files into a single one the result is not legal 
rdf/xml - there are xml doc headers:

<rdf:RDF xml:base="http://www.gutenberg.org/";>

and similar, which can only occur once per file.

i found a way to load each file individually with s-put and a loop, but this 
runs extremely slowly - it is alrady
running for more than 10 hours; each file takes half a second to load (fuseki 
running as localhost).

i am sure there is a better way?

thank you for the help!

andrew



Reply via email to