Re: SPARQL performance question

Andy Seaborne Tue, 25 Feb 2020 06:54:46 -0800

It might be worth reordering the tripe patterns and/or putting in someclustering: there is a large amount of cross product being done whichmeans many,many unwanted or duplicate pieces of work.


Fore example, move the rdf:type to the end (do you need them at all?)


    Andy

(Replaced long URIs for email:)

?leftA    <#simplexConnectTo>  ?connectionAA .
?connectionAA <#simplexConnectTo>  ?rightA .

?leftA    <#simplexConnectTo>  ?connectionAB .
?connectionAB <#simplexConnectTo>  ?rightB .

?leftB    <#simplexConnectTo>  ?connectionBA .
?connectionBA <#simplexConnectTo>  ?rightA .

?leftB    <#simplexConnectTo>  ?connectionBB .
?connectionBB <#simplexConnectTo>  ?rightB .

?connectionAA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .
?connectionBA <fhowl/singlepointfailpattern#boundTo>  ?singleHardware .

?connectionAA rdf:type <#portConnection> .
?connectionAB rdf:type <#portConnection> .
?connectionBA rdf:type <#portConnection> .
?connectionBB rdf:type <#portConnection> .

?leftA    rdf:type              <#thread> .
?leftB    rdf:type              <#thread> .
?rightA   rdf:type              <#thread> .
?rightB   rdf:type              <#thread> .
?singleHardware rdf:type              <#platform> .





On 24/02/2020 10:01, Rob Vesse wrote:

To add to what else has been said

Query execution in Apache Jena ARQ is based upon lazy evaluation wherever 
possible.  Calling execSelect() simply prepares a ResultSet that is capable of 
delivering the results but doesn't actually evaluate the query and produce any 
results until you call hasNext()/next().  When you call either of these methods 
then ARQ does the minimum amount of work to return the next result (or batch of 
results) depending on the underlying algebra of the query.

Rob

On 23/02/2020, 18:58, "Steve Vestal" <steve.ves...@adventiumlabs.com> wrote:

     I'm looking for suggestions on a SPARQL performance issue.  My test
     model has ~800 sentences, and processing of one select query takes about
     25 minutes.  The query is a basic graph pattern with 9 variables and 20
     triples, plus a filter that forces distinct variables to have distinct
     solutions using pair-wise not-equals constraints.  No option clause or
     anything else fancy.

I am issuing the query against an inference model. Most of the asserted

     sentences are in imported models.  If I iterate over all the statements
     in the OntModel, I get ~1500 almost instantly.  I experimented with
     several of the reasoners.

Below is the basic control flow. The thing I found curious is that the

     execSelect() method finishes almost instantly.  It is the iteration over
     the ResultSet that is taking all the time, it seems in the call to
     selectResult.hasNext(). The result has 192 rows, 9 columns.  The results
     are provided in bursts of 8 rows each, with ~1 minute between bursts.

OntModel ontologyModel = getMyOntModel(); // Tried various reasoners

             String selectQuery = getMySelectQuery();
             QueryExecution selectExec =
     QueryExecutionFactory.create(selectQuery, ontologyModel);
             ResultSet selectResult = selectExec.execSelect();
             while (selectResult.hasNext()) {  // Time seems to be spent in
     hasNext
                 QuerySolution selectSolution = selectResult.next();
                 for (String var : getMyVariablesOfInterest() {
                     RDFNode varValue = selectSolution.get(var);
                     // process varValue
                 }
             }

Any suggestions would be appreciated.

Re: SPARQL performance question

Reply via email to