Hi

Just wanted to say thanks to both Andy and Lorenz for amazing answers!

Certainly gives me a great place to start looking at this low level stuff.

When you say that Jena uses the Volcano design, is this a custom implementation of Volcano, or is there some external library that Jena (and hence me!) can use?

thanks again

graham

On 23/08/21 10:51 pm, Andy Seaborne wrote:


On 22/08/2021 09:02, Lorenz Buehmann wrote:
some implementation details are among the docs, like for TDB you can see here: https://jena.apache.org/documentation/tdb/architecture.html

Execution via ARQ can be found here: https://jena.apache.org/documentation/query/arq-query-eval.html

In general, I think Jena also follows the Volcano design for query evaluation which is quite common for databases in general.


More pointers and information is provided soon by Andy for sure ...

Mainly this mailing list :-)

On the storage side (for TDB2 unless noted otherwise):
TIM = Transactions In Memory = the thing behind DatasetFactory.createTxnMem().

TDB2:

* All RDF terms are referred to by 64bit id (NodeId).
* There is a dictionary of RDF terms (URIs, literals, bnodes, quoted triples for RDF-star) called the Node table. * Some literals are inlined - integer values upto 56bits are stored directly in the NodeId, not in the NodeTable. Dates, datetimes, doubles, booleans etc. If they don't fit inline, the long form goes into the node table.

* Triples are stored as 3 NodeIds. Fix length, 24 bytes, packed into blocks with no additional per-triple space.
Quads for named graphs are 4 NodeIds.

* Indexes are B+trees with MVCC blocks (MultiVersion Concurrency Control) for transactions.

* The B+Trees store 8k blocks, so the tree is about 100 to 200 way fan out and sorted.

* triples are totally indexes - normally SPO, POS and OSP. So to lookup ":s :p ?var" use SPO to find the start SP and scan until P changes.

* There is no formal triple table. The indexes have all the information needed - 3 NodeIds.

* Buffering - the node table has a large RAM cache in-front of it.

* Buffering - indexes use memory mapped files so the OS does the buffering. This reduces the admin burden on users.

Execution: see OpExecutor.

* Yes - it's a volcano-style evaluator. It pulls triples from iterators.

* Optimization is high-level rewrite of the SPARQL algebra, including adding specialised alegra operators. The two big optimizations are trying to use index joins (small left hand side) and filter placement (as soon as all required variables available). There are others but those two have the biggest impact.

* Low level optimization includes trying to reorder basic graph patterns to be in a better order. Rather boringly, the fixed data independent algorithm does nearly as well as the statictics driven one! and requires no admin.

* The execution is general purpose and not, specialised to analytics workloads. It tends to execute with joins that do not use a lot of memory memory for example.

* Each query execution is not parallel (generally).

For Fuseki, parallelism happens across multiple requests running concurrently rather than parallelism within one query execution.

It's not that long ago that typical low-ish end machines have enough (real) cores to make parallelism worth it. Interesting direction to go in and some done experimentally. (lesson 1 from cocurrency and optimication: they have their own costs! sometime its just better to execute a query then think too long about it!)

TIM:
* It stores Node objects so no dictionary needed. Java object references and value-based .equals are the dictionary. Also, the parsers all add a high degree of object sharing.

* It is MVCC trees so again transactions are MR+SW: multiple-readers-and-single-writer at the same time.


All storage datasets are transactional - compositions of data from different places is only weakly transactional and MRSW (multiple read OR a single writer).

That's the main low level details.

    Andy


On 22.08.21 01:24, graham wrote:
Hi

I was wondering if there is any documentation, other than the code and comments, about how Jena stores data on disk (e.g. the RDF equivalents of things like record layouts, record linkages, etc)?

Also is there any similar documentation for how Jena executes queries (e.g. things like stream structure, buffering, operator parallelism, etc)?

thanks

graham

--
                    Veni, Vidi, VISA: I came, I saw, I did a little shopping.

Reply via email to