On 29/10/2021 21:00, Dragan Lesic wrote:
Hello,
I'm trying to migrate an application with content from themoviedb.org and
other sources.
The dataset count in fuseki is about 55 million triples.
To preserve order of connected data, reification is used, dsta insert with
sparql rdf*.

RDF* and RDF-star are different.

RDF* is the name for the original work by Olaf Hartig in collaboration with Bryan Thompson/Blazegraph.

RDF-star is the community work based on RDF*.
https://w3c.github.io/rdf-star/cg-spec/2021-07-01.html

In particular, in RDF-star the quoted triple may, or may not, be in the graph. This changes the indexing and hence query performance. Currently, Jena does not maintain an additional index because that would require everyone else to reload data (whether using RDF-star or not).

In RDF* <<>> means quote the triple and assert it.
In RDF-star, it is just quote the triple.

Annotation syntax bridges the gap.

WHERE { sub:123 shema:genres ?o {| shema:order ?order |} }


which is equivalent to writing

WHERE { sub:123 shema:genres ?o .
        << sub:123 shema:genres ?o  >> shema:order ?order .
}

That should be faster - please let us know.

    Andy


When querying the performance is horrible, example query:

PREFIX sub: <https://myexample.com/movie/>
PREFIX shema: <https://schema.org/>
SELECT ?o
WHERE { <<sub:123 shema:genres ?o>> shema:order ?order . }

This simple query which returns 10 triples takes about 195 seconds.
On blazegraph it's 50ms!
Test with a small dataset is fast...
I'm using Jena 4.2.0, Fuseki2 and TDB2
Cloud docker environment with 32GB ram for the instance and fast storage.

Any suggestions on this, is there any configuration i am missing?
Thank you very much, best regards.

Reply via email to