Hi Cristóbal,
I can't tell from just the queries.
You say the first query has zero results so it is not the ORDER BY.
The queries themselves may be inefficient and changing the order may
have a significant effect (all data dependent).
All three do not use FILTER NOT EXISTS where they could. That may well
be faster.
Q1 uses OPTIONAL !BOUND (see inline), Q2 and Q3 use MINUS.
For Q3, does "LIMIT 1" make any difference?
Andy
On 29/07/2021 15:57, Cristóbal Miranda wrote:
Hi Andy,
I have three queries in which this happens:
1.
SELECT ?var1 ?var2 ?var3 ?var4
WHERE {
?var1 <http://www.wikidata.org/prop/direct/P570> ?var2 .
FILTER ( ( ( ?var2 > "2016-07-30T00:00:00Z"^^<
http://www.w3.org/2001/XMLSchema#dateTime> ) )
) .
FILTER ( ( ( ?var2 > ( NOW ( ) - "P32D"^^<
http://www.w3.org/2001/XMLSchema#duration> ) ) && ( ?var2 < NOW ( )
) )
) .
?var1 <http://www.wikidata.org/prop/direct/P31> <
http://www.wikidata.org/entity/Q5> .
---------
OPTIONAL {
?var1 <http://www.wikidata.org/prop/direct/P1196> ?var5 .
}
FILTER ( ( !( BOUND ( ?var5 ) ) )
same as
FILTER
NOT EXISTS { ?var1 <http://www.wikidata.org/prop/direct/P1196> ?var5 }
except different cardinality. Because ?var5 isn't used eslewhere, is
that waht was meant here?
) .
?var1 <http://www.wikidata.org/prop/direct/P569> ?var6 .
FILTER ( ( ( ?var6 > "1954-12-31T00:00:00Z"^^<
http://www.w3.org/2001/XMLSchema#dateTime> ) )
) .
OPTIONAL {
?var1 <http://wikiba.se/ontology#statements> ?var3 .
}
OPTIONAL {
?var1 <http://wikiba.se/ontology#sitelinks> ?var4 .
}
}
ORDER BY DESC( ?var4 ) DESC( ?var2 )ASC( ?var1 )
2.
SELECT ?var1 ?var2
WHERE {
?var1 <http://www.wikidata.org/prop/direct/P1889> ?var2 .
MINUS {
?var1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
http://wikiba.se/ontology#Property> .
}
}
3.
SELECT DISTINCT ?var1 ?var2 ?var3 ?var4
WHERE {
?var1 <http://www.wikidata.org/prop/direct/P569> ?var3 .
FILTER ( ( ( ?var3 > "1956-01-01T00:00:00Z"^^<
http://www.w3.org/2001/XMLSchema#dateTime> ) && ( ?var3 <
"1957-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ) )
) .
OPTIONAL {
?var1 <http://www.wikidata.org/prop/direct/P570> ?var5 .
}
FILTER ( ( !( BOUND ( ?var5 ) ) )
) .
?var2 <http://www.wikidata.org/prop/direct/P569> ?var3 .
OPTIONAL {
?var2 <http://www.wikidata.org/prop/direct/P570> ?var6 .
}
FILTER ( ( !( BOUND ( ?var6 ) ) )
) .
?var1 <http://www.w3.org/2000/01/rdf-schema#label> ?var4 .
?var2 <http://www.w3.org/2000/01/rdf-schema#label> ?var4 .
?var1 <http://www.wikidata.org/prop/direct/P31> <
http://www.wikidata.org/entity/Q5> .
?var2 <http://www.wikidata.org/prop/direct/P31> <
http://www.wikidata.org/entity/Q5> .
?var1 <http://www.wikidata.org/prop/direct/P21> ?var7 .
?var2 <http://www.wikidata.org/prop/direct/P21> ?var7 .
FILTER ( ( ( STR ( ?var1 ) < STR ( ?var2 ) ) )
) .
MINUS {
?var1 <http://www.wikidata.org/prop/direct/P7> ?var2 .
}
MINUS {
?var1 <http://www.wikidata.org/prop/direct/P9> ?var2 .
}
MINUS {
?var1 <http://www.wikidata.org/prop/direct/P1889> ?var2 .
}
MINUS {
?var1 <http://www.wikidata.org/prop/direct/P460> ?var2 .
}
}
LIMIT 500
The first one is taking 141 minutes, the second 32 minutes and the third
is still running.
I have run about 1200 queries, where 38 times the exception was thrown, but
interestingly not
in the first two queries, where I got 0 and 575925 results respectively.
I'm using jena 4.1.0.
On Thu, 29 Jul 2021 at 05:27, Andy Seaborne <[email protected]> wrote:
Hi Cristóbal,
What's the query and which version of jena is this?
Andy
On 28/07/2021 19:39, Cristóbal Miranda wrote:
Hello everyone,
I'm trying to run a sequence of queries with TDB, using a
locally loaded dataset. I don't want to wait more than a few
seconds for each query to finish. My attempt to do this looks like
the following:
try {
RDFConnection rdfConnection = RDFConnectionFactory.connect(dataset);
QueryExecution queryExecution = rdfConnection.query(query);
queryExecution.setTimeout(timeoutMilliseconds);
ResultSet resultSet = queryExecution.execSelect();
while (resultSet.hasNext()) {
QuerySolution querySolution = resultSet.next();
...
}
} catch (QueryCancelledException e) {
...
}
The problem is that this is not working. With htop I see that
the process gets stuck in disk operations. One of the queries
took about 2 hours with the code above. An idea would
be trying to run this in a new thread and stopping the thread outside
once
the timeout is reached, but I'm almost sure this wouldn't be a safe
way to stop the processing, even if it worked.
Is there a better way to do this?
Cristobal