unsubscribe me please. On Fri, Jun 3, 2022 at 10:30 AM Andy Seaborne <a...@apache.org> wrote:
> Probably a bug then. > > Are you going to be making improvements to query > tranformation/optimization as part of your work on the enhanced SERVICE > handling on the active PR? > > Andy > > On 03/06/2022 10:39, Claus Stadler wrote: > > Hi again, > > > > > > I think the point was missed; what I was actually after is that in the > > following query a "join" is optimized into a "sequence" > > > > and I wonder whether this is the correct behavior if a LIMIT/OFFSET is > > present. > > > > So running the following query with optimize enabled/disabled gives > > different results: > > > > SELECT * { > > SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a > > <http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } > > SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s > > <http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 } > > } > > > > > > ➜ bin ./arq --query service-query.rq > > > > (sequence !!!!! > > > > (service <https://dbpedia.org/sparql> > > (slice _ 5 > > (bgp (triple ?s > > <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > > <http://dbpedia.org/ontology/MusicalArtist>)))) > > (service <https://dbpedia.org/sparql> > > (slice _ 1 > > (bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label> > > ?x))))) > > > > > ------------------------------------------------------------------------------- > > > > > | s | > > x | > > > =============================================================================== > > > > > | <http://dbpedia.org/resource/Aarti_Mukherjee> | "Aarti > > Mukherjee"@en | > > | <http://dbpedia.org/resource/Abatte_Barihun> | "Abatte > > Barihun"@en | > > | <http://dbpedia.org/resource/Abby_Abadi> | "Abby > > Abadi"@en | > > | <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> | "Abd al > > Malik"@de | > > | <http://dbpedia.org/resource/Abdul_Wahid_Khan> | "Abdul Wahid > > Khan"@en | > > > ------------------------------------------------------------------------------- > > > > > > > > > ./arq --explain --optimize=no --query service-query.rq > > (join !!!!! > > (service <https://dbpedia.org/sparql> > > (slice _ 5 > > (bgp (triple ?s > > <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > > <http://dbpedia.org/ontology/MusicalArtist>)))) > > (service <https://dbpedia.org/sparql> > > (slice _ 1 > > (bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label> > > ?x))))) > > --------- > > | s | x | > > ========= > > --------- > > > > > > Cheers, > > > > Claus > > > > > > On 03.06.22 10:22, Andy Seaborne wrote: > >> > >> > >> On 02/06/2022 21:19, Claus Stadler wrote: > >>> Hi, > >>> > >>> I noticed some interesting results when using SERVICE with a sub > >>> query with a slice (limit / offset). > >>> > >>> > >>> Preliminary Remark: > >>> > >>> Because SPARQL semantics is bottom up, a query such as the following > >>> will not yield bindings for ?x: > >>> > >>> SELECT * { > >>> SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a > >>> <http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } > >>> SERVICE <https://dbpedia.org/sparql> { BIND(?s AS ?x) } > >>> } > >> > >> The query plan for that is: > >> > >> (join > >> (service <https://dbpedia.org/sparql> > >> (slice _ 5 > >> (bgp (triple ?s > >> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > >> <http://dbpedia.org/ontology/MusicalArtist>)))) > >> (service <https://dbpedia.org/sparql> > >> (extend ((?x ?s)) > >> (table unit)))) > >> > >> which has not had any optimization applied. ARQ checks scopes before > >> doing any transfomation. > >> > >> Change BIND(?s AS ?x) to BIND(?s1 AS ?x) > >> > >> and it will have (join) replaced by (sequence) > >> > >> ----------------------------------------------------------- > >> | s | x | > >> =========================================================== > >> | <http://dbpedia.org/resource/Aarti_Mukherjee> | | > >> | <http://dbpedia.org/resource/Abatte_Barihun> | | > >> | <http://dbpedia.org/resource/Abby_Abadi> | | > >> | <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> | | > >> | <http://dbpedia.org/resource/Abdul_Wahid_Khan> | | > >> ----------------------------------------------------------- > >> > >> LIMIT 1 is a no-op - the second SERVICE always evals to one row of no > >> columns. Which makes the second SERVICE the join identity and the > >> result is the first SERVICE. > >> > >> Column ?x is only in the display because it is in "SELECT *" > >> > >>> Query engines, such as Jena, attempt to optimize execution. For > >>> instance, in the following query, > >>> > >>> instead of retrieving all labels, jena uses each binding for a > >>> Musical Artist to perform a lookup at the service. > >>> > >>> The result is semantically equivalent to bottom up evaluation > >>> (without result set limits) - just much faster. > >>> > >>> SELECT * { > >>> SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a > >>> <http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } > >>> SERVICE <https://dbpedia.org/sparql> { ?s > >>> <http://www.w3.org/2000/01/rdf-schema#label> ?x } > >>> } > >>> > >>> > >>> The main point: > >>> > >>> However, the following query with ARQ interestingly yields one > >>> binding for every musical artist - which contradicts the bottom-up > >>> paradigm: > >>> > >>> SELECT * { > >>> SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a > >>> <http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } > >>> SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s > >>> <http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 } > >>> } > >>> > >>> > >>> <http://dbpedia.org/resource/Aarti_Mukherjee> "Aarti Mukherjee"@en > >>> <http://dbpedia.org/resource/Abatte_Barihun> "Abatte Barihun"@en > >>> ... 3 more results ... > >>> > >>> > >>> With bottom-up semantics, the second service clause would only fetch > >>> a single binding so in the unlikely event that it happens to join > >>> with a musical artist I'd expect at most one binding > >>> > >>> in the overall result set. > >>> > >>> Now I wonder whether this is a bug or a feature. > >>> > >>> I know that Jena's VarFinder is used to decide whether to perform a > >>> bottom-up evaluation using OpJoin or a correlated join using > >>> OpSequence which results in the different outcomes. > >>> > >>> The SPARQL spec doesn't say much about the semantics of Service > >>> (https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval) > >> > >> It isn't about the semantics of SERVICE. Its the (join) local-side. > >> > >>> So I wonder which behavior is expected when using SERVICE with > >>> SLICE'd queries. > >> > >> "SERVICE { pattern }" executes "SELECT * { pattern }" at the far end, > >> LIMITS and all. > >> > >> Andy > >> > >>> > >>> > >>> Cheers, > >>> > >>> Claus > >>> > >>> >