minor fix, the regex for ?county should be "east sussex" for the first two options, and for the property path solution it should be

regex("east sussex|lewes", ?X, "i")

On 10.12.21 12:44, Andy Seaborne wrote:
Matt - thanks for the update

Other ways to speed the query up are:

* Use regex - I know you don't liek regex but the regex is compiled only once

FILTER ( regex("lewes", ?county, "i")
       || regex("lewes", ?district, "i")
       || regex("lewes", ?parish, "i")
       )

* Use UNION:

This is exploiting the data shape: each optional is independent and the overall filter means no matches at all never gets out.

{
  ?s heng-schema:county ?county .
  FILTER ( regex("lewes", ?county, "i")
} UNION {
  ?s heng-schema:district ?district
  FILTER ( regex("lewes", ?district, "i")
} UNION {
    ?s heng-schema:parish   ?parish
    FILTER ( regex("lewes", ?parish, "i")
  }

* Use a pproperty path:

where {
  ?s simplename:name ?name .
  ?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
  FILTER ( regex("lewes", ?X, "i")
}

although that might play badly with the LIMIT - depends on the data

See below about comparing to GraphDB:

On 10/12/2021 07:38, Lorenz Buehmann wrote:
Yeah, as expected, putting FILTER into OPTIONAL can help.

Just as a comment, the semantics is a bit different between


SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
}
FILTER(?o = "val")
}

and

SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
     FILTER(?o = "val")
}
}

The first query evaluates to false in the FILTER if there is no ?o at all, thus, ?s bindings might be dropped. In the second you'll always get all ?s bindings. That is the reason why no optimizer will push the filter into the OPTIONAL pattern.


Can you give some numbers on the current runtime of the query now? Did you try the fulltext index?

I also saw your thread on SO where you tried GraphDB as well. Any comparison numbers so far?

To be comparable the query needs to be with the LIMIT or turned into a SELECT (COUNT(*) AS ?C) ...

because the order items come off the indexes will have an effect on time for LIMIT 10.


On 09.12.21 23:17, Matt Whitby wrote:
James was kind enough to spend some time talking me through the query.

My original query (which timed out) was:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {

?s <http://www.historicengland.org.uk/data/schema/simplename/name> ?name .

OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.

FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Putting the FILTER under each statement helped it immensely.

select ?s
where {
     ?s <http://www.historicengland.org.uk/data/schema/simplename/name>?name.
?s <http://www.historicengland.org.uk/data/schema/parish/> ?parish .
   FILTER (CONTAINS(lcase(?parish),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/district/> ?district .
   FILTER (CONTAINS(lcase(?district),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/county/> ?county .
   FILTER (CONTAINS(lcase(?county),"east sussex"))
}

Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).



M

Reply via email to