Hi Andy,

The datatype of ?pix is normally an integer.

I also found out that division was twice as slow for integers here in f17 as 
f14/16, the below filter may not make a lot of sense. That said, I was using it 
to see if it had to do with fundamental arithmetic operations.
FILTER (((?pix)/(?pix)) = 1002 )

Takes ~ 2-3 seconds in f16
Takes ~ 6 seconds in f17

It led me to try what follows and could be telling. I cast the numerator to 
double and it was much faster for the set and yielded the same results of when 
I ran with an integer.

This takes ~ 3 seconds in BOTH f16/f17
FILTER (((?pix)-FLOOR((xsd:double(?pix))/7200)*7200) = (1002) )
F16
real    0m2.794s
user    0m0.005s
sys     0m0.004s

F17
real    0m2.783s
user    0m0.006s
sys     0m0.004s

This takes ~ 20 s in f17  ~ 6 seconds in F16
FILTER (((?pix)-FLOOR(((?pix))/7200)*7200) = (1002) )
F16
real    0m5.774s
user    0m0.006s
sys     0m0.004s
F17
real    0m20.020s
user    0m0.006s
sys     0m0.004s


What follows are the timings on the query you asked for:

For this query

select (count(*) AS ?C)
WHERE {
 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .
FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )
}

F16
bash-4.2$ time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="C"/>
  </head>
  <results>
    <result>
      <binding name="C">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>
      </binding>
    </result>
  </results>
</sparql>

real    0m7.668s
user    0m0.005s
sys     0m0.004s


F17
bash-4.2$ time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="C"/>
  </head>
  <results>
    <result>
      <binding name="C">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>
     </binding>
    </result>
  </results>
</sparql>

real    0m23.995s
user    0m0.004s
sys     0m0.005s

F16

bash-4.2$ time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="C"/>
  </head>
  <results>
    <result>
      <binding name="C">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>
      </binding>
    </result>
  </results>
</sparql>

real    0m6.347s
user    0m0.006s
sys     0m0.003s


F17
bash-4.2$ time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="C"/>
  </head>
  <results>
    <result>
      <binding name="C">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>
      </binding>
    </result>
  </results>
</sparql>

real    0m20.952s
user    0m0.004s
sys     0m0.005s

stats about the data:
select (count(*) AS ?C)
WHERE {
 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .
}

time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="C"/>
  </head>
  <results>
    <result>
      <binding name="C">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>1000000</literal>
      </binding>
    </result>
  </results>
</sparql>

real    0m0.546s
user    0m0.004s
sys     0m0.004s


select (count(distinct ?record) AS ?R)
WHERE {
 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .
}

time curl -k --data-urlencode [email protected] -H  
'Accept:application/n-triples' http://localhost:3030/kg/sparql  > canary8.out
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  104k    0   313  100  104k      0    228  0:07:48  0:07:47  0:00:01     0

real    7m47.731s
user    0m0.034s
sys     0m0.041s

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#";>
  <head>
    <variable name="R"/>
  </head>
  <results>
    <result>
      <binding name="R">
        <literal 
datatype="http://www.w3.org/2001/XMLSchema#integer";>1000000</literal>
      </binding>
    </result>
  </results>
</sparql>


Best,
Omar

From: Andy Seaborne <[email protected]>
Sent: Friday, June 04, 2021 8:03 AM
To: [email protected]
Subject: Re: possible SPARQL performance regression Fuseki 3.14/3.16 -> 3.17


Hi Omar,



Thank for the detailed report.



It might be the casting - 
JENA-1984<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/JENA-1984__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFVRlY2LI$>
 - which was a change

3.16.0->3.17.0



Could you time these queries please on a fast and on a slow system?

It is checking it is the filter.



select (count(*) AS ?C)

WHERE {

 ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix .

 FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )

}



and (less casting):



select (count(*) AS ?C)

WHERE {

 ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix .

 FILTER (xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200 = 1002 )

}





It would be useful to know some count stats about the data:

select (count(*) AS ?C)

WHERE {

 ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix .

}



select (count(distinct ?record) AS ?R)

WHERE {

 ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix .

}



and what is the datatype of ?pix normally?



    Andy






On 04/06/2021 04:09, 
[email protected]<mailto:[email protected]> wrote:

We have observed some surprising performance behavior with different versions 
of fuseki.

The versions used in this example are 3.14.0 (referred to as "f14" henceforth), 
3.16.0 ("f16" henceforth), and 3.17.0 ("f17" henceforth)  .



We have a large database in TDB1 (built using Jena TDB version 3.14.0).  There 
are roughly 140 million triples in the dataset.



We discovered that many of our queries take much longer using f17 than using 
f14 or f16; that's right, the newer fuseki is much slower than the older one.



We tested more with f14, but noticed that perhaps the change in performance 
wasn't realized till after f16 after lightly testing.



We isolated a simple query that exhibits this behavior; this query reliable 
runs 4x slower in f17 than f14.



select *

WHERE {  ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix .

FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )

?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?CONSUMERID .

}



The modular arithmetic calculation in the filter is used to limit the number of 
records processed at any one time, and is an essential part of the query.



Here are some sample run stats.



Using f14:



time curl -k --data-urlencode [email protected]<mailto:[email protected]>  
http://localhost:3030/kg/sparql<https://urldefense.com/v3/__http:/localhost:3030/kg/sparql__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF-cMtJ6c$>
  > /dev/null

real    0m7.159s

user    0m0.008s

sys     0m0.002s





Using f17:



time curl -k --data-urlencode [email protected]<mailto:[email protected]>  
http://localhost:3030/kg/sparql<https://urldefense.com/v3/__http:/localhost:3030/kg/sparql__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF-cMtJ6c$>
  > /dev/null

real    0m23.192s

user    0m0.004s

sys     0m0.009s





We ran this test several times, and got essentially the same result (within 
about 5% on each end).



We also ran this with JENA_HOME and JENAROOT set to a 3.15 Jena installation 
and a 3.17 Jena installation when we started fuseki; this made no difference in 
the timing.



We asked tdbquery to explain the query; the result was not very illuminating:





16:54:06 INFO  exec       :: ALGEBRA

  (filter (= 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 (- 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 ?pix) (* (floor (/ 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 ?pix) 7200)) 7200))) 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 1002\

))

    (quadpattern

      (quad <urn:x-arq:DefaultGraphNode%3E ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix)

      (quad <urn:x-arq:DefaultGraphNode%3E ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?CONSUMERID)

    ))

16:54:06 INFO  exec       :: TDB

  (filter (= 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 (- 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 ?pix) (* (floor (/ 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 ?pix) 7200)) 7200))) 
(<http://www.w3.org/2001/XMLSchema#integer><https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
 1002\

))

    (quadpattern

      (quad <urn:x-arq:DefaultGraphNode%3E ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?pix)

      (quad <urn:x-arq:DefaultGraphNode%3E ?record 
<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID><https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
 ?CONSUMERID)

    ))



Any suggestions on how we could get the same level of performance as we were 
with f14(f16) in f17?



Kind regards,

Omar


Reply via email to