> For Q3, does "LIMIT 1" make any difference?

No, I'm trying it now and it is taking several minutes at least.


> except different cardinality. Because ?var5 isn't used eslewhere, is
> that waht was meant here?


I'm also not too sure about the meaning of the queries in general, I'm just
using
a large dataset of anonymized queries. I took one of the
files from here https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en
(they say those are from Wikidata's users) and executed a script to filter
out some types of queries, the result of that is a file with 27 million
queries.
But to answer your question about this specific query, since this is
Wikidata,
what I'd do is to run the query on https://query.wikidata.org/, and infer
the meaning
of the variables, ?var1 is some person and P570 date of death, etc... and
from the filters it seems
that they want to find those who died in the last 30 days and were born
after 1954. ?var5
usage seems to indicate that they want to find those whose cause of death
is not
known (because it's an OPTIONAL and !BOUND would filter out those where
there is a cause of death).
In this case it is reasonable that I get no results because the dataset I'm
using is older than 30 days.
So yes, you are right, they are equivalent for this query and with NOT
EXISTS would
probably be faster. But still, what I'd like to have is a way to stop the
processing
after a few seconds consistently, independent of the query.

I just did quick peek at the code and found that, correct me if I'm wrong,
the cancellation of
a query is propagated in a top down manner for the QueryIterators, in a way
that to notice
that the query was aborted you have to get the next result (I'm looking
specifically at QueryIterRepeatApply.java:65),
meanwhile the query might be stuck at some heavy processing in a lower
level, which might be the case for
nested OPTIONALs.
In the case of QueryIterMinus I see that in the constructor you have to
create the index for the right iterator, but
that index creation is not stoppable. (On a side note, did you try doing
QueryIterMinus similar to QueryIterOptionalIndex,
using QueryIterRepeatApply? I just have the impression that running that
subquery with left values bound and checking
for only one result could be faster, but this is just a suggestion in case
you haven't tried, don't take this
too seriously).

If you consider that those are things to be fixed and don't have it in high
priority I could give it a try and send a PR,
if it's ok with you.

Sorry if I seem too intrusive here, but I'm really interested in having the
timeout work well for any query!



On Fri, 30 Jul 2021 at 04:47, Andy Seaborne <[email protected]> wrote:

> Hi Cristóbal,
>
> I can't tell from just the queries.
>
> You say the first query has zero results so it is not the ORDER BY.
>
> The queries themselves may be inefficient and changing the order may
> have a significant effect (all data dependent).
>
> All three do not use FILTER NOT EXISTS where they could. That may well
> be faster.
>
> Q1 uses OPTIONAL !BOUND (see inline), Q2 and Q3 use MINUS.
>
> For Q3, does "LIMIT 1" make any difference?
>
>      Andy
>
> On 29/07/2021 15:57, Cristóbal Miranda wrote:
> > Hi Andy,
> >
> > I have three queries in which this happens:
> >
> > 1.
> >
> > SELECT ?var1  ?var2  ?var3  ?var4
> >> WHERE {
> >>    ?var1  <http://www.wikidata.org/prop/direct/P570>  ?var2 .
> >>   FILTER (  ( (  ?var2  >  "2016-07-30T00:00:00Z"^^<
> >> http://www.w3.org/2001/XMLSchema#dateTime>  ) )
> >> ) .
> >>   FILTER (  ( (  ?var2  >  (  NOW (  ) - "P32D"^^<
> >> http://www.w3.org/2001/XMLSchema#duration>  )  ) && (  ?var2  <  NOW
> (  )
> >>   ) )
> >> ) .
> >>    ?var1  <http://www.wikidata.org/prop/direct/P31>  <
> >> http://www.wikidata.org/entity/Q5> .
>
> ---------
> >>   OPTIONAL {
> >>    ?var1  <http://www.wikidata.org/prop/direct/P1196>  ?var5 .
> >>   }
> >>   FILTER (  ( !( BOUND (  ?var5  ) ) )
>
> same as
>
> FILTER
> NOT EXISTS { ?var1 <http://www.wikidata.org/prop/direct/P1196> ?var5 }
>
> except different cardinality. Because ?var5 isn't used eslewhere, is
> that waht was meant here?
>
> >> ) .
> >>    ?var1  <http://www.wikidata.org/prop/direct/P569>  ?var6 .
> >>   FILTER (  ( (  ?var6  >  "1954-12-31T00:00:00Z"^^<
> >> http://www.w3.org/2001/XMLSchema#dateTime>  ) )
> >> ) .
> >>   OPTIONAL {
> >>    ?var1  <http://wikiba.se/ontology#statements>  ?var3 .
> >>   }
> >>   OPTIONAL {
> >>    ?var1  <http://wikiba.se/ontology#sitelinks>  ?var4 .
> >>   }
> >> }
> >> ORDER BY  DESC( ?var4 ) DESC( ?var2 )ASC( ?var1 )
> >>
> >>
> > 2.
> >
> >> SELECT ?var1  ?var2
> >> WHERE {
> >>    ?var1  <http://www.wikidata.org/prop/direct/P1889>  ?var2 .
> >>    MINUS   {
> >>      ?var1  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  <
> >> http://wikiba.se/ontology#Property> .
> >>    }
> >> }
> >>
> >>
> > 3.
> >
> >> SELECT DISTINCT ?var1  ?var2  ?var3  ?var4
> >> WHERE {
> >>    ?var1  <http://www.wikidata.org/prop/direct/P569>  ?var3 .
> >>   FILTER (  ( (  ?var3  >  "1956-01-01T00:00:00Z"^^<
> >> http://www.w3.org/2001/XMLSchema#dateTime>  ) && (  ?var3  <
> >>   "1957-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>
> ) )
> >> ) .
> >>   OPTIONAL {
> >>    ?var1  <http://www.wikidata.org/prop/direct/P570>  ?var5 .
> >>   }
> >>   FILTER (  ( !( BOUND (  ?var5  ) ) )
> >> ) .
> >>    ?var2  <http://www.wikidata.org/prop/direct/P569>  ?var3 .
> >>   OPTIONAL {
> >>    ?var2  <http://www.wikidata.org/prop/direct/P570>  ?var6 .
> >>   }
> >>   FILTER (  ( !( BOUND (  ?var6  ) ) )
> >> ) .
> >>    ?var1  <http://www.w3.org/2000/01/rdf-schema#label>  ?var4 .
> >>    ?var2  <http://www.w3.org/2000/01/rdf-schema#label>  ?var4 .
> >>    ?var1  <http://www.wikidata.org/prop/direct/P31>  <
> >> http://www.wikidata.org/entity/Q5> .
> >>    ?var2  <http://www.wikidata.org/prop/direct/P31>  <
> >> http://www.wikidata.org/entity/Q5> .
> >>    ?var1  <http://www.wikidata.org/prop/direct/P21>  ?var7 .
> >>    ?var2  <http://www.wikidata.org/prop/direct/P21>  ?var7 .
> >>   FILTER (  ( (  STR (  ?var1  )  <  STR (  ?var2  )  ) )
> >> ) .
> >>    MINUS   {
> >>      ?var1  <http://www.wikidata.org/prop/direct/P7>  ?var2 .
> >>    }
> >>    MINUS   {
> >>      ?var1  <http://www.wikidata.org/prop/direct/P9>  ?var2 .
> >>    }
> >>    MINUS   {
> >>      ?var1  <http://www.wikidata.org/prop/direct/P1889>  ?var2 .
> >>    }
> >>    MINUS   {
> >>      ?var1  <http://www.wikidata.org/prop/direct/P460>  ?var2 .
> >>    }
> >> }
> >> LIMIT 500
> >
> >
> >
> > The first one is taking 141 minutes,  the second 32 minutes and the third
> > is still running.
> > I have run about 1200 queries, where 38 times the exception was thrown,
> but
> > interestingly not
> > in the first two queries, where I got 0 and 575925 results respectively.
> >
> > I'm using jena 4.1.0.
> >
> >
> >
> > On Thu, 29 Jul 2021 at 05:27, Andy Seaborne <[email protected]> wrote:
> >
> >> Hi Cristóbal,
> >>
> >> What's the query and which version of jena is this?
> >>
> >>       Andy
> >>
> >> On 28/07/2021 19:39, Cristóbal Miranda wrote:
> >>> Hello everyone,
> >>>
> >>> I'm trying to run a sequence of queries with TDB, using a
> >>> locally loaded dataset. I don't want to wait more than a few
> >>> seconds for each query to finish. My attempt to do this looks like
> >>> the following:
> >>>
> >>>>
> >>>> try {
> >>>>     RDFConnection rdfConnection =
> RDFConnectionFactory.connect(dataset);
> >>>>     QueryExecution queryExecution = rdfConnection.query(query);
> >>>>     queryExecution.setTimeout(timeoutMilliseconds);
> >>>>     ResultSet resultSet = queryExecution.execSelect();
> >>>>     while (resultSet.hasNext()) {
> >>>>       QuerySolution querySolution = resultSet.next();
> >>>>       ...
> >>>>     }
> >>>> } catch (QueryCancelledException e) {
> >>>>      ...
> >>>> }
> >>>
> >>>
> >>> The problem is that this is not working. With htop I see that
> >>> the process gets stuck in disk operations. One of the queries
> >>> took about 2 hours with the code above. An idea would
> >>> be trying to run this in a new thread and stopping the thread outside
> >> once
> >>> the timeout is reached, but I'm almost sure this wouldn't be a safe
> >>> way to stop the processing, even if it worked.
> >>>
> >>> Is there a better way to do this?
> >>>
> >>> Cristobal
> >>>
> >>
> >
>

Reply via email to