Interesting Lorenz; thanks for that pointer!

nit: Looks like maybe the compatibility matrix needs to be updated for
recent (>4.0) versions of Jena?

On Wed, Dec 8, 2021 at 3:42 AM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

> It does indeed, you just have to set it up initially, see docs:
> https://jena.apache.org/documentation/query/text-query.html
>
> On 08.12.21 11:47, Matt Whitby wrote:
> > Jena has a text index?
> >
> > On Wed, 8 Dec 2021 at 10:07, Lorenz Buehmann <
> > buehm...@informatik.uni-leipzig.de> wrote:
> >
> >> Even if it's not the strings leading to performance issues, using the
> >> Jena text index might be definitely more efficient
> >>
> >> On 08.12.21 10:38, Matt Whitby wrote:
> >>> Fuseki. No inference. TDB2.
> >>>
> >>> M
> >>>
> >>> On Wed, 8 Dec 2021 at 09:25, Andy Seaborne <a...@apache.org> wrote:
> >>>
> >>>> Lots of questions! Details matter!!
> >>>>
> >>>> On 08/12/2021 09:05, Matt Whitby wrote:
> >>>>> It's hosted in a container in Azure.
> >>>> (Jena storage layer)
> >>>>
> >>>> Using TDB1? TDB2?
> >>>>
> >>>>> I test it via Postman (though we're writing a RESTFul API to sit on
> >> top).
> >>>> So this is Fuseki? Is there any inference being used?
> >>>>
> >>>>        Andy
> >>>>
> >>>>> On Wed, 8 Dec 2021 at 09:00, Andy Seaborne <a...@apache.org> wrote:
> >>>>>
> >>>>>> Hi Matt,
> >>>>>>
> >>>>>> That query does not look couple-of-minutes expensive.
> >>>>>>
> >>>>>> Could you run it removing parts to see what happens? e.g. Remove one
> >>>>>> OPTIONAL and it's associated part of the filter.
> >>>>>>
> >>>>>> Which storage layer are you using?
> >>>>>>
> >>>>>>         Andy
> >>>>>>
> >>>>>> On 07/12/2021 20:18, aj...@apache.org wrote:
> >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <matt.whi...@gmail.com>
> >>>> wrote:
> >>>>>>> I dare say running an lcase against each field doesn't help
> matters,
> >>>> but
> >>>>>> with
> >>>>>>> no other way of doing a case-insensitive search (well, Regex - but
> >> who
> >>>>>> likes
> >>>>>>> Regex?) I'm not sure.
> >>>>>>>
> >>>>>>>
> >>>>>>> On this point alone, if it does turn out that string processing is
> >> what
> >>>>>> is
> >>>>>>> costing you time, you might adjust your data to include a
> convenience
> >>>>>>> property with county, district, and parish in lowercase. Then you
> >> could
> >>>>>> do
> >>>>>>> a more direct (and cheaper) match.
> >>>>>>>
> >>>>>>> That having been said, it seems unlikely to me that timed-out
> queries
> >>>> are
> >>>>>>> due to something as cheap as lowercasing. Have you tried peeling
> off
> >>>> some
> >>>>>>> of those OPTIONALs to see how much they cost?
> >>>>>>>
> >>>>>>> Adam
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Dec 7, 2021, 1:55 PM Matt Whitby <matt.whi...@gmail.com>
> >>>> wrote:
> >>>>>>>> I have a Sparql question if that's okay.
> >>>>>>>>
> >>>>>>>> There are only around 8m triples in our test data, so pretty
> small.
> >>>>>>>>
> >>>>>>>> The query takes a good couple of minutes to run (and sometimes
> just
> >>>>>> times
> >>>>>>>> out).
> >>>>>>>>
> >>>>>>>> I dare say running an lcase against each field doesn't help
> matters,
> >>>> but
> >>>>>>>> with no other way of doing a case-insensitive search (well, Regex
> -
> >>>> but
> >>>>>> who
> >>>>>>>> likes Regex?) I'm not sure.
> >>>>>>>>
> >>>>>>>> Any obvious ways to make it less bad?
> >>>>>>>>
> >>>>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> >>>>>>>> select ?s ?name
> >>>>>>>> where {
> >>>>>>>>
> >>>>>>>> ?s <http://www.historicengland.org.uk/data/schema/simplename/name
> >
> >>>>>> ?name .
> >>>>>>>> OPTIONAL {?s <
> http://www.historicengland.org.uk/data/schema/county>
> >>>>>>>> ?county}.
> >>>>>>>> OPTIONAL {?s <
> >> http://www.historicengland.org.uk/data/schema/district/
> >>>>>>>> ?district}.
> >>>>>>>> OPTIONAL {?s <
> http://www.historicengland.org.uk/data/schema/parish>
> >>>>>>>> ?parish}.
> >>>>>>>>
> >>>>>>>> FILTER (CONTAINS(lcase(?county),"lewes") || CONTAINS(
> >>>>>>>> lcase(?district),"lewes") || CONTAINS( lcase(?parish),"lewes"))
> >>>>>>>>
> >>>>>>>> }
> >>>>>>>> limit 10
> >>>>>>>>
> >
>

Reply via email to