Hi Arthur, as a meta comment first, please keep in mind that you will find additional support on Jena related questions on their jena-dev mailing list. The experts there might have additional and better insights into query performance over union graphs.
Having said this, I believe the underlying problem will remain - the issue that the native database indexes cannot be used efficiently is the same whether you have 2 or 100 sub-graphs. The issue would go away if you either manage to tell your database to create optimized indices for multiple named graphs, or if you just have a single graph to begin with. But with graphs the size of 20k triples I don't see why you are using a database at all. You may have a database as back-up (and persistent storage), but for optimized query performance, you may want to load all triples into memory (or use TDB that may do that for you on demand). Another thought is to leave all in a single database and have queries that do some massive filtering right in the first graph pattern (WHERE clause). Holger On Aug 10, 2010, at 8:46 AM, Arthur Keen wrote: > Holger, > > You mention that this performance issue shows up when when you query large > data sets and many merged graphs and that SPARQL cannot be optimized to > perform a complex query as a single optimized operation, because it may have > to dynamically merge partial query results from these subgraphs. > I would like to avoid this problem: > I am using a strategy where I use aggregate owl models that import a number > of small owl models (10-30kstatements) that each import the same owl schema > model. I have done things this way in order to easily group the smaller > models (by time, customer, region, etc) for combined queries while keeping a > small footprint for the individual models. Will queries against the combined > models suffer this optimization issue? Would it be better to store the > smaller models as named graphs in SDB and and then issue SPARQL queries > against Union Graphs of these smaller models? Or would it be better to store > all the submodels in one large graph? > > regards > > Arthur > On Fri, Mar 26, 2010 at 4:52 PM, Holger Knublauch <[email protected]> > wrote: > Yeah, this is a recurring problem that nearly every advanced user (i.e. with > large data sets and many merged graphs) runs into sooner or later. It's a > matter of re-architecting the Jena graphs to exploit optimizations. You > wouldn't expect good performance if you had an application that merged > multiple SQL databases either... > > Holger > > > On Mar 27, 2010, at 4:22 AM, Schmitz, Jeffrey A wrote: > >> We’ve been down this road before, let me review the old e-mails and see if I >> have any other questions… >> >> Jeff, >> >> whenever you have a MultiUnion graph (or OntModel) that consists of more >> than one sub-graph, then the performance of the SPARQL engine might go down >> significantly. This is because the triple matches may need to dynamically >> merge partial results from multiple sub-graphs. On the other hand, if you >> just have a single graph (incl a single SDB graph), then the system can >> exploit native optimizations and do complex graph patterns with a single, >> optimized operation. In practice my guess is that you will have best >> performance if you put all sub-graphs into the same SDB (possibly split into >> named graphs) and then operate on the union graph (via the named graph >> <urn:x-arq:UnionGraph>). >> >> Holger >> >> >> >> Jeff >> >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Schmitz, Jeffrey A >> Sent: Friday, March 26, 2010 1:00 PM >> To: [email protected] >> Subject: [topbraid-users] SPINInferences.run with OntModel >> >> Hello, >> I had a question about the SPINInferences.run operation. For some >> reason, when I pass an OntModel in as the first parameter (i.e. the model to >> be queried) the operation takes a LONG time to complete. But when I pass in >> a Model, that (I think) is the equivalent of the full OntModel, it is very >> fast. For example, I have an OntModel (ontModel) that I would like to run >> the SPIN Rules on to generate the inferred triples into it. Currently, I >> have to copy the complete OntModel into a Model using the following code: >> >> Model model = ModelFactory.createDefaultModel(); >> model.notifyEvent(GraphEvents.startRead); >> try { >> model.add(ontModel); >> } finally { >> model.notifyEvent(GraphEvents.finishRead); >> } >> >> Then, I can call SPINInference.run on the Model… >> >> SPINInferences.run(model, spinInfModel, _spinRulesClass2QueryMap, >> initialTemplateBindings, exp, _spinRuleStats, true, >> inferenceType.inferenceProp(), >> comparator, null); >> >> and it runs very fast. However, if I try to cut the copy out of the >> equation and just pass in the OntModel directly to SPINInferences.run… >> >> SPINInferences.run (ontModel, spinInfModel, _spinRulesClass2QueryMap, >> initialTemplateBindings, exp, _spinRuleStats, true, >> inferenceType.inferenceProp(), >> comparator, null); >> >> It runs very slowly. Any ideas on what’s going on here? >> >> So, >> -- >> You received this message because you are subscribed to the Google >> Group "TopBraid Suite Users", the topics of which include TopBraid Composer, >> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. >> To post to this group, send email to >> [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/topbraid-composer-users?hl=en >> >> To unsubscribe from this group, send email to >> topbraid-users+unsubscribegooglegroups.com or reply to this email with the >> words "REMOVE ME" as the subject. >> >> -- >> You received this message because you are subscribed to the Google >> Group "TopBraid Suite Users", the topics of which include TopBraid Composer, >> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. >> To post to this group, send email to >> [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/topbraid-composer-users?hl=en >> >> To unsubscribe from this group, send email to >> topbraid-users+unsubscribegooglegroups.com or reply to this email with the >> words "REMOVE ME" as the subject. > > > -- > You received this message because you are subscribed to the Google > Group "TopBraid Suite Users", the topics of which include TopBraid Composer, > TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. > To post to this group, send email to > [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/topbraid-composer-users?hl=en > > To unsubscribe from this group, send email to > topbraid-users+unsubscribegooglegroups.com or reply to this email with the > words "REMOVE ME" as the subject. > > > -- > You received this message because you are subscribed to the Google > Group "TopBraid Suite Users", the topics of which include TopBraid Composer, > TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. > To post to this group, send email to > [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/topbraid-users?hl=en -- You received this message because you are subscribed to the Google Group "TopBraid Suite Users", the topics of which include TopBraid Composer, TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/topbraid-users?hl=en
