Re: Re: Query - Update

Lorenz Buehmann Fri, 10 Dec 2021 23:02:41 -0800

minor fix, the regex for ?county should be "east sussex" for the firsttwo options, and for the property path solution it should be


regex("east sussex|lewes", ?X, "i")


On 10.12.21 12:44, Andy Seaborne wrote:

Matt - thanks for the update

Other ways to speed the query up are:
* Use regex - I know you don't liek regex but the regex is compiledonly once
FILTER ( regex("lewes", ?county, "i")
       || regex("lewes", ?district, "i")
       || regex("lewes", ?parish, "i")
       )

* Use UNION:
This is exploiting the data shape: each optional is independent andthe overall filter means no matches at all never gets out.
{
  ?s heng-schema:county ?county .
  FILTER ( regex("lewes", ?county, "i")
} UNION {
  ?s heng-schema:district ?district
  FILTER ( regex("lewes", ?district, "i")
} UNION {
    ?s heng-schema:parish   ?parish
    FILTER ( regex("lewes", ?parish, "i")
  }

* Use a pproperty path:

where {
  ?s simplename:name ?name .
  ?s heng-schema:county|heng-schema:district|heng-schema:parish ?X .
  FILTER ( regex("lewes", ?X, "i")
}

although that might play badly with the LIMIT - depends on the data

See below about comparing to GraphDB:

On 10/12/2021 07:38, Lorenz Buehmann wrote:
Yeah, as expected, putting FILTER into OPTIONAL can help.

Just as a comment, the semantics is a bit different between


SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
}
FILTER(?o = "val")
}

and

SELECT ?s ?o {
?s a :C .
OPTIONAL {
     ?s <p> ?o
     FILTER(?o = "val")
}
}
The first query evaluates to false in the FILTER if there is no ?o atall, thus, ?s bindings might be dropped. In the second you'll alwaysget all ?s bindings. That is the reason why no optimizer will pushthe filter into the OPTIONAL pattern.
Can you give some numbers on the current runtime of the query now?Did you try the fulltext index?
I also saw your thread on SO where you tried GraphDB as well. Anycomparison numbers so far?
To be comparable the query needs to be with the LIMIT or turned into aSELECT (COUNT(*) AS ?C) ...
because the order items come off the indexes will have an effect ontime for LIMIT 10.
On 09.12.21 23:17, Matt Whitby wrote:
James was kind enough to spend some time talking me through the query.

My original query (which timed out) was:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {
?s <http://www.historicengland.org.uk/data/schema/simplename/name>?name .
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/county>
?county}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/district/>
?district}.
OPTIONAL {?s <http://www.historicengland.org.uk/data/schema/parish>
?parish}.

FILTER (CONTAINS(lcase(?county),"east sussex") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Putting the FILTER under each statement helped it immensely.

select ?s
where {
?s<http://www.historicengland.org.uk/data/schema/simplename/name>?name.
?s <http://www.historicengland.org.uk/data/schema/parish/> ?parish .
   FILTER (CONTAINS(lcase(?parish),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/district/>?district .
   FILTER (CONTAINS(lcase(?district),"lewes"))
?s <http://www.historicengland.org.uk/data/schema/county/> ?county .
   FILTER (CONTAINS(lcase(?county),"east sussex"))
}

Putting back the OPTIONAL and running it a third time, slowed it down
(though not as badly as the first iteration).



M

Re: Re: Query - Update

Reply via email to