Also interested in answers. I asked Andy a few years ago and I believe he said it was best done via SPARQL, which went against my intuition that some fast path shortcuts would help. I guess he meant this:
https://jena.apache.org/documentation/query/update.html Looking at https://github.com/apache/jena/blob/main/jena-examples/src/main/java/arq/examples/update/UpdateProgrammatic.java I realize I am unsure whether the file: URI is in the client or server side of the protocol. UpdateLoad("file:etc/update-data.ttl", "http://example/g2")) The reason I care is from a sense that we have now a lot of interoperable data in RDF, and many SPARQL implementations with different strengths and weaknesses. Yet it is hard to systematically mix-and-match eg in a Cloud (docker etc.) environment without a lot of work on data loading. We are 25+ years into RDF now, but hopefully things will continue to get easier! Excuse the top-posting, Dan On Mon, 16 Jan 2023 at 13:18, Steven Blanchard <[email protected]> wrote: > Hello, > > I would like to upload a very large dataset (UniRef) to a fuseki > database. > I tried to upload file by file but the upload time was exponential with > each file added. > > code use : > ```python > url: str = f"{jena_url}/{db_name}/data > multipart_data: MultipartEncoder = MultipartEncoder( > fields={ > "file": ( > f"{file_name}", > open( > f"{path_file}", > "rb", > ), > "text/turtle", > ) > } > ) > response : requests.Request = requests.post( > url, > data=multipart_data, > headers={"Content-Type": multipart_data.content_type}, > cookies=cookies, > ) > ``` > > Then I tried to upload with the command tdb2.tdbloader. > By uploading all the files in the same command the upload became very > much faster. Also, tdb2.tdbloader has an option to parallelize the > upload. > > code use : > ```bash > bin/tdb2.tdbloader --loader=parallel --loc > fuseki/base/databases/uniref/ data/uniref_* > ``` > The problem with tdb2 is that it does not work in http. > > I would like to know if it is possible to get the same performance as > tdb2 (loading all files at once, parallelization...) by using an http > request? > I'm also open to other suggestions to optimize this file loading. > > What explains this exponential evolution of the upload time when adding > data in several times? > > Thank you for your help, > > Steven > >
