minor fix, the regex for ?county should be "east sussex" for the first
two options, and for the property path solution it should be
regex("east sussex|lewes", ?X, "i")
On 10.12.21 12:44, Andy Seaborne wrote:
Matt - thanks for the update
Other ways to speed the query up are:
* Use regex - I know you don't liek regex but the regex is compiled
only once
FILTER ( regex("lewes", ?county, "i")
|| regex("lewes", ?district, "i")
|| regex("lewes", ?parish, "i")
)
* Use UNION:
This is exploiting the data shape: each optional is independent and
the overall filter means no matches at all never gets out.
{
?s heng-schema:county ?county .
FILTER ( regex("lewes", ?county, "i")
} UNION {
?s heng-schema:district ?district
FILTER ( regex("lewes", ?district, "i")
} UNION {
?s heng-schema:parish ?parish
FILTER ( regex("lewes", ?parish, "i")
}
* Use a pproperty path:
where {
?s simplename:name ?name .
?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
FILTER ( regex("lewes", ?X, "i")
}
although that might play badly with the LIMIT - depends on the data
See below about comparing to GraphDB:
On 10/12/2021 07:38, Lorenz Buehmann wrote:
Yeah, as expected, putting FILTER into OPTIONAL can help.
Just as a comment, the semantics is a bit different between
SELECT ?s ?o {
?s a :C .
OPTIONAL {
?s <p> ?o
}
FILTER(?o = "val")
}
and
SELECT ?s ?o {
?s a :C .
OPTIONAL {
?s <p> ?o
FILTER(?o = "val")
}
}
The first query evaluates to false in the FILTER if there is no ?o at
all, thus, ?s bindings might be dropped. In the second you'll always
get all ?s bindings. That is the reason why no optimizer will push
the filter into the OPTIONAL pattern.
Can you give some numbers on the current runtime of the query now?
Did you try the fulltext index?
I also saw your thread on SO where you tried GraphDB as well. Any
comparison numbers so far?
To be comparable the query needs to be with the LIMIT or turned into a
SELECT (COUNT(*) AS ?C) ...
because the order items come off the indexes will have an effect on
time for LIMIT 10.
On 09.12.21 23:17, Matt Whitby wrote:
James was kind enough to spend some time talking me through the query.
My original query (which timed out) was:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {
?s <http://www.historicengland.org.uk/data/schema/simplename/name>
?name .
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.
FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
}
limit 10
Putting the FILTER under each statement helped it immensely.
select ?s
where {
?s
<http://www.historicengland.org.uk/data/schema/simplename/name>?name.
?s <http://www.historicengland.org.uk/data/schema/parish/> ?parish .
FILTER (CONTAINS(lcase(?parish),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/district/>
?district .
FILTER (CONTAINS(lcase(?district),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/county/> ?county .
FILTER (CONTAINS(lcase(?county),"east sussex"))
}
Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).
M