I'm seeking guidance for setting expectations for TDB sparql performance
as the size and complexity of the queries grows.

The dataset has about 600 million triples, around 200 million
non-literal nodes, about 500 predicates.

I generate sparql queries from logical rules, which as it turns out can
be complicated. In most cases the generated sparql performs acceptably.
But multiple (possibly nested) OPTIONAL and UNION clauses seem to tip
the scales toward poor performance.

Does anyone have stories from experience or pointers to literature on
the following points:

- Effect of multiple and nested OPTIONAL clauses on process growth.
- What sort of process growth do additional UNION clauses cause: linear,
logarithmic, exponential?
- Does the absolute number of variables or predicates mentioned in the
query affect performance as much as the complexity of the graph
patterns?
- effect of complex, non-normalized FILTER expressions

The TDB dataset is on a fairly powerful machine. I'm not so much
interested in absolute performance numbers, as I am in relative
performance of different queries on the same dataset and platform.

Thanks,
--Paul

Reply via email to