Both your suggestions for rewriting the query worked. I'm lost with the reasons, but for future cases, breaking problematic queries with {} is they way to go?

On 04/11/2022 11.25, rve...@dotnetrdf.org wrote:
So yes as suspected the triple patterns are being reordered badly in the BGP:

   (sequence
     (table (vars ?sct_code)
       (row [?sct_code "298314008"])
     )
     (bgp
       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
       (triple ?c skosxl:prefLabel ??0)
       (triple ??0 lsu:code ?sct_code)
     )))

The optimizer doesn’t take into account the fact that the ?sct_code variable is 
going to be bound by the VALUES clause (table in the algebra) so considers that 
the least specific triple pattern (as it has two variables) causing it to 
evaluate a much less specific triple pattern first.

Lorenz’s suggestion of generating statistics for your dataset is a good one, 
statistics would likely guide the optimiser that the ?c skos:inScheme 
lsu:SNOMEDCT_US triple is actually very non-specific for your dataset.

You could also try Andy’s suggestion else-thread i.e. --set 
arq:optReorderBGP=false passed to the CLI command in question, or if this is 
being called from code ARQ.getContext().set(ARQ.optReorderBGP, false);

The other thing you can do is explicitly break up your query further i.e.

{ VALUES ?sct_code { "298314008" }
   {  _:b0  lsu:code          ?sct_code .
     ?c    skosxl:prefLabel  _:b0 . }
   {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
   }

Essentially forcing the engine to evaluate that very unspecific triple pattern 
last

Another possibility would be to change that triple pattern to be in a FILTER 
EXISTS condition, so it’d only be evaluated for matches to your other triple 
patterns i.e.

{ VALUES ?sct_code { "298314008" }
     _:b0  lsu:code          ?sct_code .
     ?c    skosxl:prefLabel  _:b0 .
    FILTER EXISTS {  ?c    skos:inScheme     lsu:SNOMEDCT_US }
   }

Hope this helps,

Rob

From: Lorenz Buehmann <buehm...@informatik.uni-leipzig.de>
Date: Thursday, 3 November 2022 at 11:12
To: users@jena.apache.org <users@jena.apache.org>
Subject: Re: Re: Weird sparql problem
tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to see
more: https://jena.apache.org/documentation/tdb/optimizer.html

Another question, did you generate the TDB stats such those could be
used by the optimizer?

for debugging purpose, you could also disable query optimization (put an
empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
manually, i.e.

WHERE
   { VALUES ?sct_code { "298314008" }
   _:b0  lsu:code          ?sct_code .
     ?c    skosxl:prefLabel  _:b0 .
     ?c    skos:inScheme     lsu:SNOMEDCT_US
   }
without stats and based on heuristics (e.g. number of variables in
triple pattern), otherwise the last triple pattern might always be
evaluated first


On 03.11.22 11:11, Mikael Pesonen wrote:
Here's the parse, hope it helps:

WHERE
   { VALUES ?sct_code { "298314008" }
     ?c    skosxl:prefLabel  _:b0 .
     _:b0  lsu:code          ?sct_code .
     ?c    skos:inScheme     lsu:SNOMEDCT_US
   }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(prefix ((owl: <http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
          (rdf: 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
          (skosxl: 
<http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
          (skos: 
<http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)
          (dcterms: <http://purl.org/dc/terms/>)
          (rdfs: 
<http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)
          (lsr: <https://resource.lingsoft.fi/>)
          (id: <http://snomed.info/id/>)
          (dcat: <http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)
          (dc: <http://purl.org/dc/elements/1.1/>)
          (lsu: <https://www.lingsoft.fi/ns/umls/>))
   (sequence
     (table (vars ?sct_code)
       (row [?sct_code "298314008"])
     )
     (bgp
       (triple ?c skos:inScheme lsu:SNOMEDCT_US)
       (triple ?c skosxl:prefLabel ??0)
       (triple ??0 lsu:code ?sct_code)
     )))


On 02/11/2022 12.32, rve...@dotnetrdf.org wrote:
For these kind of performance issues it is useful to see the SPARQL
algebra for the whole query, not just fragments of the query.  You
can use the qparse command for the version of Jena you are using to
see how it is optimising your queries e.g.

qparse --explain --query example.rq

As Lorenz suggests this may be the optimiser making a bad guess at
the appropriate order in which to evaluate the triple patterns within
the BGP but without the larger query context or the algebra all we
can do is guess.

Rob

From: Mikael Pesonen <mikael.peso...@lingsoft.fi>
Date: Tuesday, 1 November 2022 at 12:53
To: users@jena.apache.org <users@jena.apache.org>
Subject: Re: Weird sparql problem
Diferent case, but again hanging makes no sense to user, whatever are
the technical reasons.

    VALUES ?sct_code { "298314008" }
      ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

    VALUES ?sct_code { "298314008" }
      ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
lsu:SNOMEDCT_US

hangs forever


    skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:
Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:
This works as a separate query, but not in a the middle, since ?s
gets new values instead of binding to previous ?s.

{ select ?t where {
?s a ?t .
   } limit 10}
    ?t skos:prefLabel ?l
In the middle of what? Subqueries will be evaluated first - if you
really want labels for classes, you should use a DISTINCT in the
subquery such that the intermediate result is small, there shouldn't
be that many classes, but many instances with the same class, thus,
the join would be more expensive than necessary.


On 17/10/2022 14.56, Mikael Pesonen wrote:
?s a ?t .
    ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?
I don't see how this should be related to  your initial query where ?s
was bound, which in my opinion should be an easy join. Is it possible
for you to share the dataset somehow? Also, what you can do is to
compute statistics for the TDB database with tdbstats tool [1] from
commandline and put it into the TDB folder. But even without the query
plan should take the first triple pattern, use the spo index as s and
p are bound, then pass the bindings of ?o to the evaluation of the
second triple pattern

[1]
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file



On 21/09/2022 9.15, Lorenz Buehmann wrote:
Weird, only 10M triples and each triple pattern returns only 1
binding, thus, the size is tiny - honestly I can't think of
anything except for open connections, but as you mentioned, running
the queries with only one triple pattern works as expected, so that
too many open connections shouldn't be an issue most likely.

Can you reproduce this behavior with newer Jena versions like 4.6.1?

Or can you reproduce this on different servers as well?

Is it also stuck of your run the query directly after you restart
Fuseki?


On 19.09.22 13:49, Mikael Pesonen wrote:
On 15/09/2022 17.48, Lorenz Buehmann wrote:
Forgot:

- size of result for each triple pattern? Might affect if hash
join can be used.
It's one row for each.
- your hardware?
Normal server with 16gigs mem.
- is it just the first query after starting Fuseki? Connections
have been closed? Note, there was also a bug in a recent Jena
version, but only with TDB and too many open connections. It has
been resolved with release 4.6.1.
Jena has been running quite a while.
Might not be related, but I'm mentioning all things here
nevertheless.


On 15.09.22 11:16, Mikael Pesonen wrote:
This returns one row fast, say :C1

SELECT *
FROM <https://a.b.c>
WHERE {
    <https://x.y.z> a ?t .
    #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM <https://a.b.c>
WHERE {
    #<https://x.y.z> a ?t .
    :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM <https://a.b.c>
WHERE {
    <https://x.y.z> a ?t .
    ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!
--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi<http://www.lingsoft.fi>

Speech Applications - Language Management - Translation - Reader's
and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to