Hi Andy. More experiments this morning. I originally only send you a small part of a larger query just to expose the problem in its simplest form. And your switches work well in that case (i.e. first formulation below *with* the comments.)
But... There's a problem when using the switches in that the rest of the query wants to get the rdfs:label and various other properties. This destroys the performance gains. I've tried "yours" and "mine" with and without the switches and then the separate parts on their own to see how that goes. 1) "yours" ========== This formulation (with the switches and comments in place) - 384 ms SELECT ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel WHERE { { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355 70.8018078804016'km') } { (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS {?ent rdf:type iotic:Entity} } # OPTIONAL { # ?ent rdfs:label ?entLabel . # FILTER langMatches( lang(?entLabel), 'en' ) . # } # # OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long} # ?ent iotic:Advertises ?point . # ?point rdf:type iotic:Point . # ?point iotic:PointType ?pointType . # # OPTIONAL { # ?point rdfs:label ?pointLabel . # FILTER langMatches( lang(?pointLabel), 'en' ) . # } } Uncomment the lines and the performance drops to - 7.165 ms 2) "mine" ========= The below formulation with the switches in place 11.221 secs The below without the switches. 5.371 secs SELECT ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel WHERE { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355 70.8018078804016'km') . (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS {?ent rdf:type iotic:Entity} . OPTIONAL { ?ent rdfs:label ?entLabel . FILTER langMatches( lang(?entLabel), 'en' ) . } OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long} ?ent iotic:Advertises ?point . ?point rdf:type iotic:Point . ?point iotic:PointType ?pointType . OPTIONAL { ?point rdfs:label ?pointLabel . FILTER langMatches( lang(?pointLabel), 'en' ) . } } 3) Separately ============== Completely on their own: ======================== i.e. just the ?ent spatial:nearby line the spatial query on its own takes 50 ms i.e just the text:query line and the text on its own takes 258 ms With the OPTIONAL {} and other properties ========================================= Spatial and other properties 135 ms Text and other properties 854 ms Again, repeated thanks for you help. Mark Technology Lead, Iotic Labs mark.whar...@iotic-labs.com https://www.iotic-labs.com On 22/12/15 17:22, Andy Seaborne wrote: > Mark, > > Thanks for the experiment results. > > On 22/12/15 15:47, Mark Wharton wrote: >> Query below run without Andy's switches. >> INFO [5] 200 OK (4.985 s) >> >> Query below run with Andy's switches. >> INFO [1] 200 OK (840 ms) >> >> Them's some magic switches. Thanks, Andy. >> >> Do they have any impact (negative or positive) on any other SPARQL >> operations? I'm only curious as you've solved our main problem in that >> our "search" query was very slow. There's nowhere else that uses the >> text and spatial indexes for retrieval. > > This depends on any internal change in the latest release (Jena 3.0.1, > Fuseki 2.3.1). Prior to that it will not make the same difference. > Specially, unoptimized joins are now hash joins. > > But that is not a good choice for the "?ent rdf:type iotic:Entity" > triple pattern. The system can't distinguish different cases involving > external indexes as it knows not very much about the index details. > > Adding > > FILTER EXISTS { ?ext rdf:type iotic:Entity } > > might work because the triple pattern is really a check, not a match > setting a variable. > > A plain "?ent rdf:type iotic:Entity" will retrieve all things of that > class regardless of spatial and text query when those optimization are off. > > Andy > >> >> Many thanks for this help so close to the holiday season. Happy >> holidays to you all at Jena - keep up the good work. >> >> Mark >> >> >> Technology Lead, Iotic Labs >> +44 7973 674404 >> mark.whar...@iotic-labs.com >> https://www.iotic-labs.com >> >> On 22/12/15 11:49, Andy Seaborne wrote: >>> Mark - here is another way. >>> >>> This query: >>> >>> SELECT ?score ?ent >>> WHERE { >>> { ?ent spatial:nearby ( .... ) } >>> { ?ent text:query ( ..... ) } >>> # No ?ent rdf:type iotic:Entity . >>> # This focuses the query on the presenting issue. >>> } >>> >>> and then run Fuseki with the following flags: >>> >>> --set arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false >>> >>> for however you are running the server. >>> >>> You need both --set >>> >>> The service script will not do this very easily - if environment >>> variable FUSEKI_ARGS is set it might do. Untested. >>> >>> It is easier to run the server standalone: >>> >>> (Linux, Mac) >>> >>> The "fuseki-server" script should pass these in: >>> >>> fuseki-server \ >>> --set arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false \ >>> .. other args .. >>> >>> (Windows or any platform) >>> >>> You can call the server java code directly: all one line: >>> >>> >>> java -Xmx1200M -jar fuseki-server.jar --set >>> arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false .. other >>> args .. >>> >>> you'll need to put the full path name of fuseki-server.jar >>> >>> Sorry - I don't have your setup to test this fully. I did make sure that >>> the reworked query does lead to an execution plan that is different and >>> should yield some information about the situation. >>> >>> Andy >>> >>> On 22/12/15 09:50, Andy Seaborne wrote: >>>> On 22/12/15 07:06, Mark Wharton wrote: >>>>> Ah, wheels within wheels. >>>>> >>>>> The formulation with the filter in it is fine, except that if you want >>>>> to search for more than one word or you match in label and comment >>>>> then >>>>> the UNION formulation returns you duplicate rows. This isn't a >>>>> problem >>>>> with the Lucene search which is why (I now remember) I used it in the >>>>> first place. >>>>> >>>>> I'm not sure what version of jena I'm using - I just use the fuseki >>>>> release at 2.3.0. Is there a way to find out? >>>> >>>> 3.0.0 >>>> >>>> Many of the java commands support --version and the fuseki- server jar >>>> is an all-in-one jar: >>>> >>>> java -cp <YourInstall>/fuseki-server.jar arq.sparql --version >>>> >>>>> What's the status on the JENA-999 and JENA-1093 issues? I see there's >>>>> been some activity on 999 in the last few days. Andy Seaborne's last >>>>> comment seems encouraging. >>>>> >>>>> I don't want to adopt a single version as I'll be stuck forever >>>>> patching >>>>> back and forward and it will break eventually. >>>>> >>>>> Many thanks for your continued help. >>>> >>>> JENA-999 may sort of help but I'm not that positive because each ?ent >>>> from the first part will be different going into the second part. It >>>> looks to me as if it is the overhead of going out to Lucene. (This is >>>> Lucene right? not Solr?) >>>> >>>> The ideal is some super compilation of the text:query and spatial query >>>> into one big Lucene query. >>>> >>>> What would also be good, which is stop the general optimizer (this is >>>> nothing to do with TDB) using an index join. Except that is the better >>>> choice for the rdf:type. This is what the addition {} were trying for >>>> except the optimizer outsmarted >>>> >>>> SELECT ?score ?ent >>>> WHERE { >>>> ?ent spatial:nearby( ...) . >>>> (?ent ?score) text:query (...) . >>>> ?ent rdf:type iotic:Entity . >>>> } >>>> >>>> >>>> >>>> Mark - can you ask the query from Java? If so, >>>> >>>> Add "Optimize.noOptimizer(); " before executing the query. I can't >>>> see >>>> a way to do that from setting the environment for Fuseki. >>>> >>>> Or (the effect on time of this is version specific and whether it does >>>> anything useful is a big "maybe") you could try this: >>>> >>>> SELECT ?score ?ent >>>> WHERE { >>>> { OPTIONAL { ?ent spatial:nearby "ABC" . }} >>>> { OPTIONAL { ?ent text:query "DEF" } } >>>> } >>>> >>>> Andy >>>> >>> >>> >