Re: Sparql Query

Andy Seaborne Thu, 09 Dec 2021 06:43:21 -0800



On 08/12/2021 23:22, Jeff Lerman wrote:

Interesting Lorenz; thanks for that pointer!

nit: Looks like maybe the compatibility matrix needs to be updated for
recent (>4.0) versions of Jena?


Fixed, thanks.


On Wed, Dec 8, 2021 at 3:42 AM Lorenz Buehmann <
[email protected]> wrote:

It does indeed, you just have to set it up initially, see docs:
https://jena.apache.org/documentation/query/text-query.html

On 08.12.21 11:47, Matt Whitby wrote:

Jena has a text index?

On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
[email protected]> wrote:

Even if it's not the strings leading to performance issues, using the
Jena text index might be definitely more efficient

On 08.12.21 10:38, Matt Whitby wrote:

Fuseki. No inference. TDB2.

M

On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <[email protected]> wrote:

Lots of questions! Details matter!!

On 08/12/2021 09:05, Matt Whitby wrote:

It's hosted in a container in Azure.

(Jena storage layer)

Using TDB1? TDB2?

I test it via Postman (though we're writing a RESTFul API to sit on

top).

So this is Fuseki? Is there any inference being used?

        Andy

On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <[email protected]> wrote:

Hi Matt,

That query does not look couple-of-minutes expensive.

Could you run it removing parts to see what happens? e.g. Remove one
OPTIONAL and it's associated part of the filter.

Which storage layer are you using?

         Andy

On 07/12/2021 20:18, [email protected] wrote:

On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]>

wrote:

I dare say running an lcase against each field doesn't help

matters,

but

with

no other way of doing a case-insensitive search (well, Regex - but

who

likes

Regex?) I'm not sure.


On this point alone, if it does turn out that string processing is

what

is

costing you time, you might adjust your data to include a

convenience

property with county, district, and parish in lowercase. Then you

could

do

a more direct (and cheaper) match.

That having been said, it seems unlikely to me that timed-out

queries

are

due to something as cheap as lowercasing. Have you tried peeling

off

some

of those OPTIONALs to see how much they cost?

Adam


On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <[email protected]>

wrote:

I have a Sparql question if that's okay.

There are only around 8m triples in our test data, so pretty

small.


The query takes a good couple of minutes to run (and sometimes

just

times

out).

I dare say running an lcase against each field doesn't help

matters,

but

with no other way of doing a case-insensitive search (well, Regex

but

who

likes Regex?) I'm not sure.

Any obvious ways to make it less bad?

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?s ?name
where {

?s <http://www.historicengland.org.uk/data/schema/simplename/name

?name .

OPTIONAL {?s <

http://www.historicengland.org.uk/data/schema/county>

?county}.
OPTIONAL {?s <

http://www.historicengland.org.uk/data/schema/district/

?district}.
OPTIONAL {?s <

http://www.historicengland.org.uk/data/schema/parish>

?parish}.

FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))

}
limit 10

Re: Sparql Query

Reply via email to