Niels Andersen wrote:
If the query above returned 10,000 children in iterator1, then
iterator2 will be called 10,000 times. This does not seem to be very
efficient.
Compared with what?
If the pattern for iterator2, without the information from iterator1
returns 10,000,000 items, (which it would in the hash join case), then
it would perform worse.
To the best of my knowledge, TDB already has indexed lists of OSP,
POS and SPO. I would have thought that there was a way to run the
second query by just passing an ordered list of the objects returned
in the first query. This provides for far better matching than having
to run the same query many times.
SPO, POS, OSP are not lists (they are B+Trees with range scans).
TDB usually uses an index join (there are has joins as well).
It does it efficiently, not retrieving the RDFterm representation (which
would require persistent storage access although it is heavily cached)
but using the internal numbers used in the index.
{ :nodeResource :child ?X .
?X rdfs:label ?Y
}
TDB will, in the absence of an stats file, will execute in that order.
If you swap them, it will still start at ":nodeResource :child ?X ."
I don't see where filtering "< 5" fits into this example. rdfs;labels
are typically strings.
FILTER(?Z > 12345) is faster if done by TDB than in API code.
If you can calling the pattern repeatedly with different :nodeResource
values, then you will incur overhead.
Andy