On 09/03/17 14:48, Osma Suominen wrote:

I wanted to report a performance regression I found. This is probably
something that happened to the query optimizer in the Jena 3.1.1
development. It may be rather benign, but the result was a severe
performance regression in my application.

It is the more cautious optimization. The optimizer does not split the cases of UNION making variables bound in some solutions and not others from the case of variables being set in nested OPTIONALs.

IMO the rewrite if better anyway.

Thanks for reporting it - it is useful information for any future optimization work but it's not a limited scope fix to be applied that I can see. I have it setup for investigation locally.


With YSO [1] as data loaded into TDB, this query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  <http://www.yso.fi/onto/yso/p8627> ?p ?o .
    { ?p rdfs:subPropertyOf ?pp }
    { ?o a ?ot }

takes about 300 ms on Jena 3.2.0, while it took only around 25 ms on
Jena 3.1.0.

The fix was to separate the single OPTIONAL block into two:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  <http://www.yso.fi/onto/yso/p8627> ?p ?o .
  OPTIONAL { ?p rdfs:subPropertyOf ?pp }
  OPTIONAL { ?o a ?ot }

The result is that both Jena versions execute the query in around 25 ms.

You may wonder why I had a query like that in the first place, but this
is not the actual query that I started with, which is a way more complex
CONSTRUCT query and has many UNIONs within the OPTIONAL block  (see [2]).

The important thing was to separate the OPTIONAL block dealing with ?p
from the OPTIONAL block dealing with ?o - as long as the block only
deals with one variable from the pattern above, it may contain multiple
UNIONs and actually it makes sense to use UNIONs to avoid internal cross
products and combinatorial explosion when there are multiple solutions
for each pattern.


[1] http://api.finto.fi/download/yso/yso-skos.ttl


Reply via email to