Hi Andy, Q1 seems to have the same problem, I created a script to generate data for it: https://github.com/CristobalM/jena-query-timeout/blob/main/scripts/generate_q1_optional.py
It gets stuck in getPlan() when there are no matches for the OPTIONAL pattern having ?var5. I tried first with data having that and the timeout worked, for example, at 2 seconds. I'll send you the Visualvm snapshot, On Mon, 2 Aug 2021 at 06:47, Andy Seaborne <[email protected]> wrote: > Hi there, > > Good catch - I can make that happen on my system. > > Recorded as JENA-2139 > > The fix looks quite simple - delay making the index table until first > use - which works from testing so far. PR soon. > > That explains Q2 and Q3 but not Q1 (no MINUS!). > > The same work-during-plan building may be happening - not in the group > (that is a delayed iterator). > > If Q1 returns zero results, then the problem isn't in the sorting code > code itself (that can spend a long time during which it is not reading > query iterators but sorting zero rows is quite quick) but could be in > the need to evaluate the pattern. e.g. FILTER ( ! bound(?var5) ) is > always false. > > Removing parts of the query and seeing what happens may point to the > culprit. > > > and got the result in the attachment. > > Sorry - no attachments on the list. If you still have it, could you send > it directly to me please? > > Andy > > https://issues.apache.org/jira/browse/JENA-2139 > <https://issues.apache.org/jira/browse/JENA-2139> > > On 02/08/2021 03:56, Cristóbal Miranda wrote: > > Hi Andy, > > > > Thanks for the clarifications. > > > > I have a program to run a query with timeout from a tdb dataset > > and a script to generate an nt file for the second query. Because it is > > the simplest among the three, I decided to start with that one. The code > > is here https://github.com/CristobalM/jena-query-timeout > > <https://github.com/CristobalM/jena-query-timeout>. > > > > I think I have spotted a problem using that java code. I put that into a > > new module in the jena project and modified QueryExecutionBase with > > some prints to see that everything is being called. My initial guess > > to do > > this was that there is a concurrency problem with the TimeoutCallback > > and cancel() was never being called on the iterator and from what I saw > > that seems to be the case. > > > > So what happens is that queryIterator = getPlan().iterator() > > at QueryExecutionBase:558 is taking some time to finish and until then, > > queryIterator will remain null. Then, if the timeout occurs before that, > > queryIterator.cancel() won't be called because to do that there is > > a check that queryIterator is not null (QueryExecutionBase:189). > > > > Using `python3 generate_minus_1.py dataset.nt 10000000 9000000` > > for the dataset (and loaded with tdb.tdbloader2) and setting a timeout > > of 30 seconds, the iterator is not cancelled, but with 60 seconds it is > > cancelled. > > > > I tried using 120 seconds of timeout for the full wikidata dataset and > > the Q1 > > won't stop after that time, so I can't be sure if the same is happening > > there. I will continue with that query and see if it is the same and > > let you know what happens. > > > > Also, I did a sampling with VisualVM, using the Q2 and the dataset > > mentioned above and got the result in the attachment. > > > > > > > > > > On Fri, 30 Jul 2021 at 16:33, Andy Seaborne <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > On 30/07/2021 18:04, Cristóbal Miranda wrote: > > >> For Q3, does "LIMIT 1" make any difference? > > > > > > > > > No, I'm trying it now and it is taking several minutes at least. > > > > > > > > >> except different cardinality. Because ?var5 isn't used > > eslewhere, is > > >> that waht was meant here? > > > > > > > > > I'm also not too sure about the meaning of the queries in > > general, I'm just > > > using > > > a large dataset of anonymized queries. I took one of the > > > files from here > > https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en > > <https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en> > > > (they say those are from Wikidata's users) and executed a script > > to filter > > > out some types of queries, the result of that is a file with 27 > > million > > > queries. > > > But to answer your question about this specific query, since this > is > > > Wikidata, > > > what I'd do is to run the query on https://query.wikidata.org/ > > <https://query.wikidata.org/>, and infer > > > the meaning > > > of the variables, ?var1 is some person and P570 date of death, > > etc... and > > > from the filters it seems > > > that they want to find those who died in the last 30 days and > > were born > > > after 1954. ?var5 > > > usage seems to indicate that they want to find those whose cause > > of death > > > is not > > > known (because it's an OPTIONAL and !BOUND would filter out > > those where > > > there is a cause of death). > > > In this case it is reasonable that I get no results because the > > dataset I'm > > > using is older than 30 days. > > > So yes, you are right, they are equivalent for this query and > > with NOT > > > EXISTS would > > > probably be faster. But still, what I'd like to have is a way to > > stop the > > > processing > > > after a few seconds consistently, independent of the query. > > > > > > I just did quick peek at the code and found that, correct me if > > I'm wrong, > > > the cancellation of > > > a query is propagated in a top down manner for the > > QueryIterators, in a way > > > that to notice > > > that the query was aborted you have to get the next result (I'm > > looking > > > specifically at QueryIterRepeatApply.java:65), > > > meanwhile the query might be stuck at some heavy processing in a > > lower > > > level, which might be the case for > > > nested OPTIONALs. > > > > Yes - that the mechanism but nested OPTIONALs are optimized to > > "(condition)" which incrementally reads iterators. > > > > In a query plan, ("qparse --print=opt") it shows (condition) not > > (leftjoin). > > > > The (minus) operator may be reading all one side and then missing the > > timeout but it is still advancing one side (the left) > > > > So trying the FILTER NOT EXISTS version of one of the problem queries > > will give us some help in identifying if the problem is MINUS or not. > > > > > In the case of QueryIterMinus I see that in the constructor you > > have to > > > create the index for the right iterator, but > > > that index creation is not stoppable. (On a side note, did you > > try doing > > > QueryIterMinus similar to QueryIterOptionalIndex, > > > using QueryIterRepeatApply? I just have the impression that > > running that > > > subquery with left values bound and checking > > > for only one result could be faster, but this is just a > > suggestion in case > > > you haven't tried, don't take this > > > too seriously). > > > > The right-hand-side index building is once per operator so gets > > reuses > > across the whole of traversing the left. > > > > IndexFactory.create is reading a query iterator. > > > > So on the surface, query iterators are being read and the time > > should be > > seen. > > > > Unless it hits the LinearIndex case which looks expensive. > > > > > > > > If you consider that those are things to be fixed and don't have > > it in high > > > priority I could give it a try and send a PR, > > > if it's ok with you. > > > > That would be excellent but let's establish the problem first to > > know it > > is going to improve things. > > > > If you can point VisualVM/YourKit/... at a slow query and identify > > the > > code hot spots, that would be perfect. > > > > A small example dataset would also be good so we have a minimal, > > executable example. > > > > > > > > Sorry if I seem too intrusive here, but I'm really interested in > > having the > > > timeout work well for any query! > > > > Not at all - it is part of the point of open source. > > > > Andy > > > > > > > > > > > > > > On Fri, 30 Jul 2021 at 04:47, Andy Seaborne <[email protected] > > <mailto:[email protected]>> wrote: > > > > > >> Hi Cristóbal, > > >> > > >> I can't tell from just the queries. > > >> > > >> You say the first query has zero results so it is not the ORDER > BY. > > >> > > >> The queries themselves may be inefficient and changing the > > order may > > >> have a significant effect (all data dependent). > > >> > > >> All three do not use FILTER NOT EXISTS where they could. That > > may well > > >> be faster. > > >> > > >> Q1 uses OPTIONAL !BOUND (see inline), Q2 and Q3 use MINUS. > > >> > > >> For Q3, does "LIMIT 1" make any difference? > > >> > > >> Andy > > >> > > >> On 29/07/2021 15:57, Cristóbal Miranda wrote: > > >>> Hi Andy, > > >>> > > >>> I have three queries in which this happens: > > >>> > > >>> 1. > > >>> > > >>> SELECT ?var1 ?var2 ?var3 ?var4 > > >>>> WHERE { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P570 > > <http://www.wikidata.org/prop/direct/P570>> ?var2 . > > >>>> FILTER ( ( ( ?var2 > "2016-07-30T00:00:00Z"^^< > > >>>> http://www.w3.org/2001/XMLSchema#dateTime > > <http://www.w3.org/2001/XMLSchema#dateTime>> ) ) > > >>>> ) . > > >>>> FILTER ( ( ( ?var2 > ( NOW ( ) - "P32D"^^< > > >>>> http://www.w3.org/2001/XMLSchema#duration > > <http://www.w3.org/2001/XMLSchema#duration>> ) ) && ( ?var2 < > NOW > > >> ( ) > > >>>> ) ) > > >>>> ) . > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P31 > > <http://www.wikidata.org/prop/direct/P31>> < > > >>>> http://www.wikidata.org/entity/Q5 > > <http://www.wikidata.org/entity/Q5>> . > > >> > > >> --------- > > >>>> OPTIONAL { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P1196 > > <http://www.wikidata.org/prop/direct/P1196>> ?var5 . > > >>>> } > > >>>> FILTER ( ( !( BOUND ( ?var5 ) ) ) > > >> > > >> same as > > >> > > >> FILTER > > >> NOT EXISTS { ?var1 <http://www.wikidata.org/prop/direct/P1196 > > <http://www.wikidata.org/prop/direct/P1196>> ?var5 } > > >> > > >> except different cardinality. Because ?var5 isn't used > > eslewhere, is > > >> that waht was meant here? > > >> > > >>>> ) . > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P569 > > <http://www.wikidata.org/prop/direct/P569>> ?var6 . > > >>>> FILTER ( ( ( ?var6 > "1954-12-31T00:00:00Z"^^< > > >>>> http://www.w3.org/2001/XMLSchema#dateTime > > <http://www.w3.org/2001/XMLSchema#dateTime>> ) ) > > >>>> ) . > > >>>> OPTIONAL { > > >>>> ?var1 <http://wikiba.se/ontology#statements > > <http://wikiba.se/ontology#statements>> ?var3 . > > >>>> } > > >>>> OPTIONAL { > > >>>> ?var1 <http://wikiba.se/ontology#sitelinks > > <http://wikiba.se/ontology#sitelinks>> ?var4 . > > >>>> } > > >>>> } > > >>>> ORDER BY DESC( ?var4 ) DESC( ?var2 )ASC( ?var1 ) > > >>>> > > >>>> > > >>> 2. > > >>> > > >>>> SELECT ?var1 ?var2 > > >>>> WHERE { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P1889 > > <http://www.wikidata.org/prop/direct/P1889>> ?var2 . > > >>>> MINUS { > > >>>> ?var1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type > > <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> < > > >>>> http://wikiba.se/ontology#Property > > <http://wikiba.se/ontology#Property>> . > > >>>> } > > >>>> } > > >>>> > > >>>> > > >>> 3. > > >>> > > >>>> SELECT DISTINCT ?var1 ?var2 ?var3 ?var4 > > >>>> WHERE { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P569 > > <http://www.wikidata.org/prop/direct/P569>> ?var3 . > > >>>> FILTER ( ( ( ?var3 > "1956-01-01T00:00:00Z"^^< > > >>>> http://www.w3.org/2001/XMLSchema#dateTime > > <http://www.w3.org/2001/XMLSchema#dateTime>> ) && ( ?var3 < > > >>>> > > "1957-01-01T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime > > <http://www.w3.org/2001/XMLSchema#dateTime>> > > >> ) ) > > >>>> ) . > > >>>> OPTIONAL { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P570 > > <http://www.wikidata.org/prop/direct/P570>> ?var5 . > > >>>> } > > >>>> FILTER ( ( !( BOUND ( ?var5 ) ) ) > > >>>> ) . > > >>>> ?var2 <http://www.wikidata.org/prop/direct/P569 > > <http://www.wikidata.org/prop/direct/P569>> ?var3 . > > >>>> OPTIONAL { > > >>>> ?var2 <http://www.wikidata.org/prop/direct/P570 > > <http://www.wikidata.org/prop/direct/P570>> ?var6 . > > >>>> } > > >>>> FILTER ( ( !( BOUND ( ?var6 ) ) ) > > >>>> ) . > > >>>> ?var1 <http://www.w3.org/2000/01/rdf-schema#label > > <http://www.w3.org/2000/01/rdf-schema#label>> ?var4 . > > >>>> ?var2 <http://www.w3.org/2000/01/rdf-schema#label > > <http://www.w3.org/2000/01/rdf-schema#label>> ?var4 . > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P31 > > <http://www.wikidata.org/prop/direct/P31>> < > > >>>> http://www.wikidata.org/entity/Q5 > > <http://www.wikidata.org/entity/Q5>> . > > >>>> ?var2 <http://www.wikidata.org/prop/direct/P31 > > <http://www.wikidata.org/prop/direct/P31>> < > > >>>> http://www.wikidata.org/entity/Q5 > > <http://www.wikidata.org/entity/Q5>> . > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P21 > > <http://www.wikidata.org/prop/direct/P21>> ?var7 . > > >>>> ?var2 <http://www.wikidata.org/prop/direct/P21 > > <http://www.wikidata.org/prop/direct/P21>> ?var7 . > > >>>> FILTER ( ( ( STR ( ?var1 ) < STR ( ?var2 ) ) ) > > >>>> ) . > > >>>> MINUS { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P7 > > <http://www.wikidata.org/prop/direct/P7>> ?var2 . > > >>>> } > > >>>> MINUS { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P9 > > <http://www.wikidata.org/prop/direct/P9>> ?var2 . > > >>>> } > > >>>> MINUS { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P1889 > > <http://www.wikidata.org/prop/direct/P1889>> ?var2 . > > >>>> } > > >>>> MINUS { > > >>>> ?var1 <http://www.wikidata.org/prop/direct/P460 > > <http://www.wikidata.org/prop/direct/P460>> ?var2 . > > >>>> } > > >>>> } > > >>>> LIMIT 500 > > >>> > > >>> > > >>> > > >>> The first one is taking 141 minutes, the second 32 minutes > > and the third > > >>> is still running. > > >>> I have run about 1200 queries, where 38 times the exception > > was thrown, > > >> but > > >>> interestingly not > > >>> in the first two queries, where I got 0 and 575925 results > > respectively. > > >>> > > >>> I'm using jena 4.1.0. > > >>> > > >>> > > >>> > > >>> On Thu, 29 Jul 2021 at 05:27, Andy Seaborne <[email protected] > > <mailto:[email protected]>> wrote: > > >>> > > >>>> Hi Cristóbal, > > >>>> > > >>>> What's the query and which version of jena is this? > > >>>> > > >>>> Andy > > >>>> > > >>>> On 28/07/2021 19:39, Cristóbal Miranda wrote: > > >>>>> Hello everyone, > > >>>>> > > >>>>> I'm trying to run a sequence of queries with TDB, using a > > >>>>> locally loaded dataset. I don't want to wait more than a few > > >>>>> seconds for each query to finish. My attempt to do this > > looks like > > >>>>> the following: > > >>>>> > > >>>>>> > > >>>>>> try { > > >>>>>> RDFConnection rdfConnection = > > >> RDFConnectionFactory.connect(dataset); > > >>>>>> QueryExecution queryExecution = > > rdfConnection.query(query); > > >>>>>> queryExecution.setTimeout(timeoutMilliseconds); > > >>>>>> ResultSet resultSet = queryExecution.execSelect(); > > >>>>>> while (resultSet.hasNext()) { > > >>>>>> QuerySolution querySolution = resultSet.next(); > > >>>>>> ... > > >>>>>> } > > >>>>>> } catch (QueryCancelledException e) { > > >>>>>> ... > > >>>>>> } > > >>>>> > > >>>>> > > >>>>> The problem is that this is not working. With htop I see that > > >>>>> the process gets stuck in disk operations. One of the queries > > >>>>> took about 2 hours with the code above. An idea would > > >>>>> be trying to run this in a new thread and stopping the > > thread outside > > >>>> once > > >>>>> the timeout is reached, but I'm almost sure this wouldn't be > > a safe > > >>>>> way to stop the processing, even if it worked. > > >>>>> > > >>>>> Is there a better way to do this? > > >>>>> > > >>>>> Cristobal > > >>>>> > > >>>> > > >>> > > >> > > > > > >
