We have observed some surprising performance behavior with different versions 
of fuseki.
The versions used in this example are 3.14.0 (referred to as "f14" henceforth), 
3.16.0 ("f16" henceforth), and 3.17.0 ("f17" henceforth)  .

We have a large database in TDB1 (built using Jena TDB version 3.14.0).  There 
are roughly 140 million triples in the dataset.

We discovered that many of our queries take much longer using f17 than using 
f14 or f16; that's right, the newer fuseki is much slower than the older one.

We tested more with f14, but noticed that perhaps the change in performance 
wasn't realized till after f16 after lightly testing.

We isolated a simple query that exhibits this behavior; this query reliable 
runs 4x slower in f17 than f14.

select *
WHERE {  ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .
FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )
?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> 
?CONSUMERID .
}

The modular arithmetic calculation in the filter is used to limit the number of 
records processed at any one time, and is an essential part of the query.

Here are some sample run stats.

Using f14:

time curl -k --data-urlencode [email protected]  http://localhost:3030/kg/sparql  
> /dev/null
real    0m7.159s
user    0m0.008s
sys     0m0.002s


Using f17:

time curl -k --data-urlencode [email protected]  http://localhost:3030/kg/sparql  
> /dev/null
real    0m23.192s
user    0m0.004s
sys     0m0.009s


We ran this test several times, and got essentially the same result (within 
about 5% on each end).

We also ran this with JENA_HOME and JENAROOT set to a 3.15 Jena installation 
and a 3.17 Jena installation when we started fuseki; this made no difference in 
the timing.

We asked tdbquery to explain the query; the result was not very illuminating:


16:54:06 INFO  exec       :: ALGEBRA
  (filter (= (<http://www.w3.org/2001/XMLSchema#integer> (- 
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) (* (floor (/ 
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) 7200)) 7200))) 
(<http://www.w3.org/2001/XMLSchema#integer> 1002\
))
    (quadpattern
      (quad <urn:x-arq:DefaultGraphNode> ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix)
      (quad <urn:x-arq:DefaultGraphNode> ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?CONSUMERID)
    ))
16:54:06 INFO  exec       :: TDB
  (filter (= (<http://www.w3.org/2001/XMLSchema#integer> (- 
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) (* (floor (/ 
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) 7200)) 7200))) 
(<http://www.w3.org/2001/XMLSchema#integer> 1002\
))
    (quadpattern
      (quad <urn:x-arq:DefaultGraphNode> ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix)
      (quad <urn:x-arq:DefaultGraphNode> ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?CONSUMERID)
    ))

Any suggestions on how we could get the same level of performance as we were 
with f14(f16) in f17?

Kind regards,
Omar

Reply via email to