We have observed some surprising performance behavior with different versions
of fuseki.
The versions used in this example are 3.14.0 (referred to as "f14" henceforth),
3.16.0 ("f16" henceforth), and 3.17.0 ("f17" henceforth) .
We have a large database in TDB1 (built using Jena TDB version 3.14.0). There
are roughly 140 million triples in the dataset.
We discovered that many of our queries take much longer using f17 than using
f14 or f16; that's right, the newer fuseki is much slower than the older one.
We tested more with f14, but noticed that perhaps the change in performance
wasn't realized till after f16 after lightly testing.
We isolated a simple query that exhibits this behavior; this query reliable
runs 4x slower in f17 than f14.
select *
WHERE { ?record
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .
FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) =
xsd:integer(1002) )
?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>
?CONSUMERID .
}
The modular arithmetic calculation in the filter is used to limit the number of
records processed at any one time, and is an essential part of the query.
Here are some sample run stats.
Using f14:
time curl -k --data-urlencode [email protected] http://localhost:3030/kg/sparql
> /dev/null
real 0m7.159s
user 0m0.008s
sys 0m0.002s
Using f17:
time curl -k --data-urlencode [email protected] http://localhost:3030/kg/sparql
> /dev/null
real 0m23.192s
user 0m0.004s
sys 0m0.009s
We ran this test several times, and got essentially the same result (within
about 5% on each end).
We also ran this with JENA_HOME and JENAROOT set to a 3.15 Jena installation
and a 3.17 Jena installation when we started fuseki; this made no difference in
the timing.
We asked tdbquery to explain the query; the result was not very illuminating:
16:54:06 INFO exec :: ALGEBRA
(filter (= (<http://www.w3.org/2001/XMLSchema#integer> (-
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) (* (floor (/
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) 7200)) 7200)))
(<http://www.w3.org/2001/XMLSchema#integer> 1002\
))
(quadpattern
(quad <urn:x-arq:DefaultGraphNode> ?record
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix)
(quad <urn:x-arq:DefaultGraphNode> ?record
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?CONSUMERID)
))
16:54:06 INFO exec :: TDB
(filter (= (<http://www.w3.org/2001/XMLSchema#integer> (-
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) (* (floor (/
(<http://www.w3.org/2001/XMLSchema#integer> ?pix) 7200)) 7200)))
(<http://www.w3.org/2001/XMLSchema#integer> 1002\
))
(quadpattern
(quad <urn:x-arq:DefaultGraphNode> ?record
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix)
(quad <urn:x-arq:DefaultGraphNode> ?record
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?CONSUMERID)
))
Any suggestions on how we could get the same level of performance as we were
with f14(f16) in f17?
Kind regards,
Omar