Re: Low level details

graham Mon, 23 Aug 2021 16:13:59 -0700

Hi

Just wanted to say thanks to both Andy and Lorenz for amazing answers!


Certainly gives me a great place to start looking at this low level stuff.

When you say that Jena uses the Volcano design, is this a customimplementation of Volcano, or is there some external library that Jena(and hence me!) can use?


thanks again

graham

On 23/08/21 10:51 pm, Andy Seaborne wrote:

On 22/08/2021 09:02, Lorenz Buehmann wrote:
some implementation details are among the docs, like for TDB you cansee here: https://jena.apache.org/documentation/tdb/architecture.html
Execution via ARQ can be found here:https://jena.apache.org/documentation/query/arq-query-eval.html
In general, I think Jena also follows the Volcano design for queryevaluation which is quite common for databases in general.
More pointers and information is provided soon by Andy for sure ...
Mainly this mailing list :-)

On the storage side (for TDB2 unless noted otherwise):
TIM = Transactions In Memory = the thing behindDatasetFactory.createTxnMem().
TDB2:

* All RDF terms are referred to by 64bit id (NodeId).
* There is a dictionary of RDF terms (URIs, literals, bnodes, quotedtriples for RDF-star) called the Node table.* Some literals are inlined - integer values upto 56bits are storeddirectly in the NodeId, not in the NodeTable. Dates, datetimes,doubles, booleans etc. If they don't fit inline, the long form goesinto the node table.
* Triples are stored as 3 NodeIds. Fix length, 24 bytes, packed intoblocks with no additional per-triple space.
Quads for named graphs are 4 NodeIds.
* Indexes are B+trees with MVCC blocks (MultiVersion ConcurrencyControl) for transactions.
* The B+Trees store 8k blocks, so the tree is about 100 to 200 way fanout and sorted.
* triples are totally indexes - normally SPO, POS and OSP. So tolookup ":s :p ?var" use SPO to find the start SP and scan until Pchanges.
* There is no formal triple table. The indexes have all theinformation needed - 3 NodeIds.
* Buffering - the node table has a large RAM cache in-front of it.
* Buffering - indexes use memory mapped files so the OS does thebuffering. This reduces the admin burden on users.
Execution: see OpExecutor.

* Yes - it's a volcano-style evaluator. It pulls triples from iterators.
* Optimization is high-level rewrite of the SPARQL algebra, includingadding specialised alegra operators. The two big optimizations aretrying to use index joins (small left hand side) and filter placement(as soon as all required variables available). There are others butthose two have the biggest impact.
* Low level optimization includes trying to reorder basic graphpatterns to be in a better order. Rather boringly, the fixed dataindependent algorithm does nearly as well as the statictics drivenone! and requires no admin.
* The execution is general purpose and not, specialised to analyticsworkloads. It tends to execute with joins that do not use a lot ofmemory memory for example.
* Each query execution is not parallel (generally).
For Fuseki, parallelism happens across multiple requests runningconcurrently rather than parallelism within one query execution.
It's not that long ago that typical low-ish end machines have enough(real) cores to make parallelism worth it. Interesting direction to goin and some done experimentally. (lesson 1 from cocurrency andoptimication: they have their own costs! sometime its just better toexecute a query then think too long about it!)
TIM:
* It stores Node objects so no dictionary needed. Java objectreferences and value-based .equals are the dictionary. Also, theparsers all add a high degree of object sharing.
* It is MVCC trees so again transactions are MR+SW:multiple-readers-and-single-writer at the same time.
All storage datasets are transactional - compositions of data fromdifferent places is only weakly transactional and MRSW (multiple readOR a single writer).
That's the main low level details.

    Andy
On 22.08.21 01:24, graham wrote:
Hi
I was wondering if there is any documentation, other than the codeand comments, about how Jena stores data on disk (e.g. the RDFequivalents of things like record layouts, record linkages, etc)?
Also is there any similar documentation for how Jena executesqueries (e.g. things like stream structure, buffering, operatorparallelism, etc)?
thanks

graham

--
                    Veni, Vidi, VISA: I came, I saw, I did a little shopping.

Re: Low level details

Reply via email to