Hello,
I would like to upload a very large dataset (UniRef) to a fuseki
database.
I tried to upload file by file but the upload time was exponential with
each file added.
code use :
```python
url: str = f"{jena_url}/{db_name}/data
multipart_data: MultipartEncoder = MultipartEncoder(
fields={
"file": (
f"{file_name}",
open(
f"{path_file}",
"rb",
),
"text/turtle",
)
}
)
response : requests.Request = requests.post(
url,
data=multipart_data,
headers={"Content-Type": multipart_data.content_type},
cookies=cookies,
)
```
Then I tried to upload with the command tdb2.tdbloader.
By uploading all the files in the same command the upload became very
much faster. Also, tdb2.tdbloader has an option to parallelize the
upload.
code use :
```bash
bin/tdb2.tdbloader --loader=parallel --loc
fuseki/base/databases/uniref/ data/uniref_*
```
The problem with tdb2 is that it does not work in http.
I would like to know if it is possible to get the same performance as
tdb2 (loading all files at once, parallelization...) by using an http
request?
I'm also open to other suggestions to optimize this file loading.
What explains this exponential evolution of the upload time when adding
data in several times?
Thank you for your help,
Steven