optimization of upload of several files by a curl request

Steven Blanchard Mon, 16 Jan 2023 05:18:43 -0800

Hello,

I would like to upload a very large dataset (UniRef) to a fusekidatabase.I tried to upload file by file but the upload time was exponential witheach file added.


code use :
```python
   url: str = f"{jena_url}/{db_name}/data
   multipart_data: MultipartEncoder = MultipartEncoder(
       fields={
           "file": (
               f"{file_name}",
               open(
                   f"{path_file}",
                   "rb",
               ),
               "text/turtle",
           )
       }
   )
   response : requests.Request = requests.post(
       url,
       data=multipart_data,
       headers={"Content-Type": multipart_data.content_type},
       cookies=cookies,
   )
```

Then I tried to upload with the command tdb2.tdbloader.

By uploading all the files in the same command the upload became verymuch faster. Also, tdb2.tdbloader has an option to parallelize theupload.


code use :
```bash

bin/tdb2.tdbloader --loader=parallel --locfuseki/base/databases/uniref/ data/uniref_*

```
The problem with tdb2 is that it does not work in http.

I would like to know if it is possible to get the same performance astdb2 (loading all files at once, parallelization...) by using an httprequest?

I'm also open to other suggestions to optimize this file loading.

What explains this exponential evolution of the upload time when addingdata in several times?


Thank you for your help,

Steven

optimization of upload of several files by a curl request

Reply via email to