Re: SPI DatasetGraph creating Triples/Quads on demand using DatasetGraphInMemory

A. Soroka Fri, 12 Feb 2016 10:23:01 -0800

I wrote the DatasetGraphInMemory  code, but I suspect your question may be 
better answered by other folks who are more familiar with Jena's DatasetGraph 
implementations, or may actually not have anything to do with DatasetGraph (see 
below for why). I will try to give some background information, though.

There are several paths by which where DatasetGraphInMemory can be performing 
finds, but they come down to two places in the code, QuadTable:: and 
TripleTable::find and in default operation, the concrete forms:

https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/core/mem/PMapQuadTable.java#L100

for Quads and

https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/core/mem/PMapTripleTable.java#L99

for Triples. Those methods are reused by all the differently-ordered indexes 
within Hex- or TriTable, each of which will answer a find by selecting an 
appropriately-ordered index based on the fixed and variable slots in the find 
pattern and using the concrete methods above to stream tuples back.

As to why you are seeing your methods called in some places and not in others, 
DatasetGraphBaseFind features methods like findInDftGraph(), 
findInSpecificNamedGraph(), findInAnyNamedGraphs() etc. and that these are the 
methods that DatasetGraphInMemory is implementing. DSGInMemory does not make a 
selection between those methods— that is done by DatasetGraphBaseFind. So that 
is where you will find the logic that should answer your question.

Can you say a little more about your use case? You seem to have some efficient 
representation in memory of your data (I hope it is in-memory— otherwise it is 
a very bad choice to subclass DSGInMemory) and you want to create tuples on the 
fly as queries are received. That is really not at all what DSGInMemory is for 
(DSGInMemory is using map structures for indexing and in default mode, uses 
persistent data structures to support transactionality). I am wondering whether 
you might not be much better served by tapping into Jena at a different place, 
perhaps implementing the Graph SPI directly. Or, if reusing DSGInMemory is the 
right choice, just implementing Quad- and TripleTable and using the constructor 
DatasetGraphInMemory(final QuadTable i, final TripleTable t).

---
A. Soroka
The University of Virginia Library

> On Feb 12, 2016, at 12:58 PM, Dick Murray <dandh...@gmail.com> wrote:
> 
> Hi.
> 
> Does anyone know the "find" paths through DatasetGraphInMemory please?
> 
> For example if I extend DatasetGraphInMemory and override
> DatasetGraphBaseFind.find(node, Node, Node, Node) it breakpoints on "select
> * where {?s ?p ?o}" however if I override the other
> DatasetGraphBaseFind.find(...) methods, "select * where {graph ?g {?s ?p
> ?o}}" does not trigger a breakpoint i.e. I don't know what method it's
> calling (but as I type I'm guessing it's optimised to return the HexTable
> nodes...).
> 
> Would I be better off overriding HexTable and TriTable classes find methods
> when I create the DatasetGraphInMemory? Are all finds guaranteed to end in
> one of these methods?
> 
> I need to know the root find methods so that I can shim them to create
> triples/quads before they perform the find.
> 
> I need to create Triples/Quads on demand (because a bulk load would create
> ~100M triples but only ~1000 are ever queried) and the source binary form
> is more efficient (binary ~1GB native tree versus TDB ~50GB ~100M quads)
> than quads.
> 
> Regards Dick Murray.

Re: SPI DatasetGraph creating Triples/Quads on demand using DatasetGraphInMemory

Reply via email to