Re: SPARQL performance question

Marco Neumann Mon, 24 Feb 2020 10:14:01 -0800

Steve pls give is an update here about what you've learned once you have
found suitable optimization strategies to improve your SPARQL query. this
will give closure to the question here on the mailing list and will help
guide future review.



On Mon, Feb 24, 2020 at 5:33 PM Steve Vestal <steve.ves...@adventiumlabs.com>
wrote:

> With some advice from Dave, I made a copy of the OntModel that hopefully
> materialized the full entailment closure:
>
>         Model entailedModel = ModelFactory.createDefaultModel();
>         entailedModel.add(ontologyModel);
>
> in less than one second, the results were:
>
>     Statements in ontology model: 1146
>     Entailed model org.apache.jena.rdf.model.impl.ModelCom size  4453
>
> I ran the select query on this entailed model.  It still takes about 25
> minutes.
>
> I see there is a chapter on Query Efficiency and Debugging in DuCharme's
> book. Now seems like a good time for me to read that chapter.
>
> Thanks for all the help.
>
> On 2/24/2020 3:02 AM, Dave Reynolds wrote:
> > On 23/02/2020 23:11, Steve Vestal wrote:
> >> If I comment out the FILTER clause that prevents variable aliasing, the
> >> query is processed almost immediately.  The number of rows goes from 192
> >> to 576, but it's fast.
> >
> > Interesting. That does suggest it might actually be Sparql rather than
> > inference that's the bottleneck. The materialization experiment will
> > be a test of that.
> >
> > Though looking at your query I wonder if you need inference at all -
> > we can't see your data to be sure since the list doesn't allow
> > attachments.
> > Have you tried without any inference? Do you know what inference you
> > are relying on?
> >
> >> What is the proper way to write a query when you
> >> want a particular set of variables to have distinct solution values?
> >
> > Not sure there is a better way in general. However, I wonder if you
> > can partition your query into subgroups, filter within the groups then
> > do a simpler join on the results. That might reduce the combinatorics.
> >
> > However, I don't understand your query nor the modelling (especially
> > around simplexConnect, which looks odd) so might be wrong about that.
> >
> >> I speculated that when I iterated over the statements in the OntModel,
> >> and the number went from a model size() of ~1500 to ~4700 iterated
> >> statements, that I was materializing the entire inference closure (which
> >> was fast).  Is there some other set of calls needed to do that?
> >
> > The jena inference engines supports a mix of forward and backward
> > inference rules. The forward inference rules will run once and store
> > all the results. That's the growth you are probably seeing. That's
> > then efficient to query.
> >
> > The backward rules are run on-demand. They generally (this is
> > controllable) cache the results of the particular triple patterns that
> > are requested. Because they only cache against the specific patterns
> > ("goals") they see then, depending on what order the goals come in,
> > you can get cases where there's redundancy in those caches. Those
> > caches aren't particularly well indexed either. You can certainly
> > query one way and fill up one set of caches but then a different query
> > asks for different patterns and more rules still need to fire.
> >
> > *If* multiple overlapping caches in the backward rules is the issue
> > *then* materializing everything and not using inference after that
> > can help. It's a balance of whether you are going to query for most of
> > the data or just do a bunch of point probes. In the former case it's
> > better to work everything out once. In the latter case better to use
> > on demand rules.
> >
> > Your query pattern looks like it's going to touch everything.
> >
> >> Are there circumstances where it is faster to materialize the entire
> >> closure and query a plain model than to query the inference model
> >> itself?
> >
> > Yes, see earlier message, and above.
> >
> > Dave
> >
> >> On 2/23/2020 3:33 PM, Dave Reynolds wrote:
> >>> The issues is not performance of SPARQL but performance of the
> >>> inference engines.
> >>>
> >>> If you need some OWL inference then your best bet is OWLMicro.
> >>>
> >>> If that's tow slow to query directly then one option to try is to
> >>> materialize the entire inference closure and then query that. You can
> >>> that by simply copying the inference model to a plain model.
> >>>
> >>> If that's too slow then you'll need a higher performance third party
> >>> reasoner.
> >>>
> >>> Dave
> >>>
> >>> On 23/02/2020 18:57, Steve Vestal wrote:
> >>>> I'm looking for suggestions on a SPARQL performance issue.  My test
> >>>> model has ~800 sentences, and processing of one select query takes
> >>>> about
> >>>> 25 minutes.  The query is a basic graph pattern with 9 variables
> >>>> and 20
> >>>> triples, plus a filter that forces distinct variables to have distinct
> >>>> solutions using pair-wise not-equals constraints.  No option clause or
> >>>> anything else fancy.
> >>>>
> >>>> I am issuing the query against an inference model.  Most of the
> >>>> asserted
> >>>> sentences are in imported models.  If I iterate over all the
> >>>> statements
> >>>> in the OntModel, I get ~1500 almost instantly.  I experimented with
> >>>> several of the reasoners.
> >>>>
> >>>> Below is the basic control flow.  The thing I found curious is that
> >>>> the
> >>>> execSelect() method finishes almost instantly.  It is the iteration
> >>>> over
> >>>> the ResultSet that is taking all the time, it seems in the call to
> >>>> selectResult.hasNext(). The result has 192 rows, 9 columns.  The
> >>>> results
> >>>> are provided in bursts of 8 rows each, with ~1 minute between bursts.
> >>>>
> >>>>           OntModel ontologyModel = getMyOntModel(); // Tried various
> >>>> reasoners
> >>>>           String selectQuery = getMySelectQuery();
> >>>>           QueryExecution selectExec =
> >>>> QueryExecutionFactory.create(selectQuery, ontologyModel);
> >>>>           ResultSet selectResult = selectExec.execSelect();
> >>>>           while (selectResult.hasNext()) {  // Time seems to be
> >>>> spent in
> >>>> hasNext
> >>>>               QuerySolution selectSolution = selectResult.next();
> >>>>               for (String var : getMyVariablesOfInterest() {
> >>>>                   RDFNode varValue = selectSolution.get(var);
> >>>>                   // process varValue
> >>>>               }
> >>>>           }
> >>>>
> >>>> Any suggestions would be appreciated.
> >>>>
> >>
>


-- 


---
Marco Neumann
KONA

Re: SPARQL performance question

Reply via email to