Re: Graph Store compared to tdb2.tdbloader

Andy Seaborne Mon, 30 May 2022 00:43:09 -0700

Hi David,

On 30/05/2022 07:27, Lorenz Buehmann wrote:

Hi David,
On 29.05.22 15:34, David Habgood wrote:
Hi,

I've been running Apache Jena Fuseki 4.5.0 in a docker container. I've
loaded data to it two ways: though the graph store protocol, and using
tdb2.tdbloader before starting Jena Fuseki. No issues with either,however
I'm interested in what differences the two methods have.
With the graph store protocol, I can put larger RDF files 'close' towherethe docker container is running and handle any network issues, so theloads
have been fine. Loading data this way is convenient and allows updates
while Jena Fuseki is running. Are indexes continually updated as moredata
is loaded through the graph store protocol?


Yes.  The storage database is updated as data is loaded.

 Are there any other
disadvantages to this method or reasons it (may) not be advised for large
datasets? Conversely, I'm aware tdb2.tdbloader can load largedatasets, is
there any reason/s it should be used over graph store protocol?

The difference is how fast the data is loaded. The graph store protocoldoesn't do anything special for large data - it transactionally loadsthe incoming stream.

The various varieties of loaders have one task - load large data. Theymanipulate the internal datastructures of TDB directly, need exclusiveaccess and only apply when loading an initially empty database. If datais already present, the loader command does a simple load like GSP.

("Large" being 100 million+ - hardware dependent and to some extent datashape dependent as to the cut-over).


So less convenient but faster at scale.

Are there any other methods I should be considering (other than SPARQL
INSERT)?


Those are all the data loading methods.

I'll also be running GeoSPARQL Jena for some instances, and needing to
spatially index data. I think this will necessitate using tdb2.tdbloader
and generating the spatial index 'offline' before starting Jena/Fuseki- or
are there other ways?
At least when you have the GeoSPARQL layer enabled in your Fusekiassembler config, the index should be computed on the first start ofFuseki just once and serialized at the configured destination. Only thetext index has to be generated offline before
Thanks
David Habgood

Re: Graph Store compared to tdb2.tdbloader

Reply via email to