Hi,

I want to be able to tell if SPARQL queries are interchangeable: their
semantics is the same and they lead to the same results. I'd appreciate any
tips on detecting interchangeable queries using Jena.

Here's what I tried so far. I start with computationally cheaper
comparisons and continue with more expensive ones if the compared queries
aren't detected as interchangeable. First, I compare the queries as strings
to check easy verbatim matches.

If the compared queries don't match, I parse them to instances of
org.apache.jena.query.Query and compare them
using org.apache.jena.sparql.core.QueryCompare. This detects
interchangeable queries that have minor syntax differences, such as
different character case of SPARQL clauses (e.g., "SELECT" vs. "select").
However, for example queries using the same IRI as absolute IRI vs. compact
IRI are treated as different by QueryCompare.

Therefore, If queries aren't detected as interchangeable in this step, I
convert them to SPARQL algebra using the compile() method of
org.apache.jena.sparql.algebra.Algebra and compare them as the resulting
algebra objects. In this way queries with absolute/compact forms of the
same IRI are treated as equal. However, there other interchangeable queries
that produce unequal algebra. For example (the queries I mentioned in
https://mail-archives.apache.org/mod_mbox/jena-users/201607.mbox/browser):

# Query 1
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT *
WHERE {
  ?concept skos:broader [ skos:prefLabel ?broaderLabel ] .
}

# Query 2
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT *
WHERE {
  ?concept skos:broader/skos:prefLabel ?broaderLabel .
}

This is why I try another step to detect interchangeable queries, which is
to perform algebra optimization. However, simply calling Algebra.optimize()
on the example queries doesn't make their algebra equal. However, there's a
work-around using a custom NodeIsomorphismMap (see
https://mail-archives.apache.org/mod_mbox/jena-users/201607.mbox/browser)
that compares the queries as equal.

Nevertheless, even with these provisions, there are other kinds of
interchangeable queries that are treated as distinct. For example:

- Queries using blank nodes or unprojected variables
- Queries with different order of UNION clauses
- Queries expressing the same disjunction using UNION, VALUES, or property
path with alternatives

I suspect there is a way to make the algebra optimization more
"aggressive", so that it produces equal algebra for the above kinds of
interchangeable queries. I read Rob Vesse's excellent slides on query
optimization in Jena (
http://events.linuxfoundation.org/sites/events/files/slides/SPARQL%20Optimisation%20101%20Tutorial.pdf)
and it seems to me that much of what I need is already possible in Jena. I
see there are many algebra transformers (
https://github.com/apache/jena/tree/master/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize)
that can be enabled in the org.apache.jena.sparql.util.Context passed to
the Algebra.optimize() method. Would you recommend enabling some
optimizations that are not enabled in the default Context (i.e.
ARQ.getContext())? I also found some unreachable code
in org.apache.jena.sparql.algebra.optimize.Optimize (
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/algebra/optimize/Optimize.java#L136-L142).
Was it left in the code for documentation?

Overall, would you say that the approach to detecting interchangeable
queries via algebra optimizations is a good one? Would you suggest a
different approach?

- Jindřich

-- 
Jindřich Mynarz
http://mynarz.net/#jindrich

Reply via email to