Re: Query - Update

Andy Seaborne Fri, 10 Dec 2021 03:44:55 -0800

Matt - thanks for the update

Other ways to speed the query up are:

* Use regex - I know you don't liek regex but the regex is compiled onlyonce


FILTER ( regex("lewes", ?county, "i")
       || regex("lewes", ?district, "i")
       || regex("lewes", ?parish, "i")
       )

* Use UNION:

This is exploiting the data shape: each optional is independent and theoverall filter means no matches at all never gets out.


{
  ?s heng-schema:county ?county .
  FILTER ( regex("lewes", ?county, "i")
} UNION {
  ?s heng-schema:district ?district
  FILTER ( regex("lewes", ?district, "i")
} UNION {
    ?s heng-schema:parish   ?parish
    FILTER ( regex("lewes", ?parish, "i")
  }

* Use a pproperty path:

where {
  ?s simplename:name ?name .
  ?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
  FILTER ( regex("lewes", ?X, "i")
}

although that might play badly with the LIMIT - depends on the data

See below about comparing to GraphDB:

On 10/12/2021 07:38, Lorenz Buehmann wrote:

Yeah, as expected, putting FILTER into OPTIONAL can help.

Just as a comment, the semantics is a bit different between


SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
}
FILTER(?o = "val")
}

and

SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
     FILTER(?o = "val")
}
}
The first query evaluates to false in the FILTER if there is no ?o atall, thus, ?s bindings might be dropped. In the second you'll always getall ?s bindings. That is the reason why no optimizer will push thefilter into the OPTIONAL pattern.
Can you give some numbers on the current runtime of the query now? Didyou try the fulltext index?
I also saw your thread on SO where you tried GraphDB as well. Anycomparison numbers so far?

To be comparable the query needs to be with the LIMIT or turned into aSELECT (COUNT(*) AS ?C) ...

because the order items come off the indexes will have an effect on timefor LIMIT 10.


On 09.12.21 23:17, Matt Whitby wrote:

James was kind enough to spend some time talking me through the query.

My original query (which timed out) was:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {

?s <http://www.historicengland.org.uk/data/schema/simplename/name>?name .


OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.

FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Putting the FILTER under each statement helped it immensely.

select ?s
where {

?s<http://www.historicengland.org.uk/data/schema/simplename/name>?name.

?s <http://www.historicengland.org.uk/data/schema/parish/> ?parish .
   FILTER (CONTAINS(lcase(?parish),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/district/> ?district .
   FILTER (CONTAINS(lcase(?district),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/county/> ?county .
   FILTER (CONTAINS(lcase(?county),"east sussex"))
}

Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).



M

Re: Query - Update

Reply via email to