Fuseki Graph Store Protocol: Streaming or not?

Adrian Gschwend Tue, 29 Jun 2021 06:07:36 -0700

Hi everyone,

We have automated pipelines that write to Fuseki using the SPARQL Graph
Store protocol. This seems to work fine for smaller junks of data but
when we write a larger dataset of around 15 million triples in one
batch, this seems to fail.


After checking out what happens, we see an OOM error.

We send application/n-triples so I was expecting that it streams it.
When using tdbloader this size is not really an issue at all.

In this particular setup we first used TDB, the machine has 6GB of
memory assigned.

TDB2 seems to behave a bit better, it runs through without OOM but takes
1.5 hours for the job while it is less than 15 minutes when we split it
into smaller junks and send it in ~100k triples batches via Graph Store
Protocol.

Interestingly we never see more than 1GB of RAM used so I'm even more
confused.

Is this OOM error to be expected for large graph-store writes?


regards

Adrian

Fuseki Graph Store Protocol: Streaming or not?

Reply via email to