I'm seeking guidance for setting expectations for TDB sparql performance as the size and complexity of the queries grows.
The dataset has about 600 million triples, around 200 million non-literal nodes, about 500 predicates. I generate sparql queries from logical rules, which as it turns out can be complicated. In most cases the generated sparql performs acceptably. But multiple (possibly nested) OPTIONAL and UNION clauses seem to tip the scales toward poor performance. Does anyone have stories from experience or pointers to literature on the following points: - Effect of multiple and nested OPTIONAL clauses on process growth. - What sort of process growth do additional UNION clauses cause: linear, logarithmic, exponential? - Does the absolute number of variables or predicates mentioned in the query affect performance as much as the complexity of the graph patterns? - effect of complex, non-normalized FILTER expressions The TDB dataset is on a fairly powerful machine. I'm not so much interested in absolute performance numbers, as I am in relative performance of different queries on the same dataset and platform. Thanks, --Paul