Christian; In the end it's really no different than any kind of
database application.  There will always be a point in which
performance suffers and/or memory becomes scarce.  There are also
tradeoffs for large heap spaces - garbage collection overhead
increases with the size of memory you are trying collect garbage on.

So the first set of advice I'd give is to look carefully at your
queries.  Keep in mind that ?s ?p ?o says "bring everything into
memory", thus defeating the purpose of using a back-end.  Fortunately
Composer's SPARQL engine, ARQ, tends to ignore these.  But the advice
is still sage - cut down the size of the graph match as soon as
possible.  Note that OPTIONAL is fairly dangerous to start with - it
states "find this graph pattern or that graph pattern" and thus
increases the search space.  We are working to optimize this better.

Turn on Profiling Mode in the SPARQL View's query debugger and find
the triple pattern placements that result in the fewest match attempts
- i.e. use it to make sure you prune the search space optimally.  It
will also give you the true ordering of evaluation, which is normally
top-to-bottom, but ARQ will perform some optimizations.  Try turning
on filter placement and filter early, if that's possible.  Use LIMIT
and OFFSET to experiment with pieces of the results.

Avoid string matches as much as possible - FILTERs are often deadly in
this respect, so make sure you cut down your search space as much as
possible before doing this kind of string compare.  Note this nearly
conflicts with the earlier statement.  Where to place filters depends
on how computationally expensive the filter expression is and how much
it prunes the search space for downstream processing.

There are probably a few other SPARQL query tips and I'd love to hear
from others on this. (Self-plug alert: We cover SPARQL query
performance our Advanced Product training.)

Divide-and-conquer is the next step.  View your data back-end as a
huge well and you only want a specific bucketful at a time, being
careful to leave the rest where it is.  I.e. view heap space memory as
a limited resource.  SPARQLMotion is an excellent tool for this task
as it can be used to get specific sets of triples that can be combined
for further processing.  Remember that ASK queries terminate when a
match is found, so if you just need to find if the graph pattern
exists, ASK is much more efficient than SELECT.

For SPIN the advice is pretty much the same (Holger can say more).  Be
sure to use ?this because the engine is optimized for that pre-bound
variable.  SPIN will iterate until no new triples are found unless you
tell the engine otherwise.  Turn iterations off if that makes sense
for your application.  If some rules don't have value in forward-
chaining, set the rule property's spin:rulePropertyMaxIterationCount
to the maximum needed.

Many things to try here and I hope this isn't mistaken as being the
last word on this topic!

-- Scott

On May 28, 12:36 pm, Christian Fuerber <[email protected]> wrote:
> Hi Holger,
>
> thank you for the information. I finally set up a TDB triplestore with
> over 53 Mio triples and it shows up perfectly in TBC. But now I'm
> having performance problems when running SPARQL queries over the
> triplestore in TBC's SPARQL view. SPIN constraints also seem to run
> forever. I'm using the 64bit version of TBC 3.3.1 SE and set the java
> heapspace limit to 6144m.
>
> Is there any configurations the can speed up TBC's queries and SPIN
> constraints besides query optimization and java heapspace settings?
>
> Thanks,
> Christian
>
> On 24 Mai, 03:51, Holger Knublauch <[email protected]> wrote:
>
> > Hi Christian,
>
> > TopBraid's Sesame 2 integration is not as optimized as our other database 
> > interfaces. In practice this means that TBC cannot use any native 
> > optimizations that the database server (in your case Virtuoso) may provide. 
> > Instead, it will break down even the most complex SPARQL queries into small 
> > findSPO queries, which may lead to significant performance problems. Maybe 
> > that's why constraint checking apparently did not work for you when you can 
> > SPIN constraints on Virtuoso. With smaller databases this impact may not 
> > have been sufficiently severe to notice. But I am glad that you have been 
> > able to confirm that Virtuoso is working well with TBC in principle.
>
> > Over the week end we also had some enlightening examples of SPARQL queries 
> > that were not optimized: we had a query with three OPTIONAL clauses, 
> > leading to a large number of potential combinations. Replacing those with 
> > other SPARQL patterns were leading to many orders of magnitude speed 
> > improvements. Just saying this in case you have some "dangerous" queries in 
> > your constraints. I assume you have executed the queries individually, e.g. 
> > from the SPARQL view, to test their performance before putting them into 
> > your SPIN constraint library.
>
> > Finally, I discovered a performance issue after running the SPIN 
> > constraints from the Problems View: if hundreds or thousands of violations 
> > were found, then just updating those into the Problems view may freeze the 
> > system. I have just fixed this for 3.3.2 (and 3.4).
>
> > Regards,
> > Holger
>
> > On May 21, 2010, at 10:25 PM, Christian Fuerber wrote:
>
> > > Hi Holger,
>
> > > fortunately i can apply SPIN constraints now on the small graph (245
> > > triples) in virtuoso. Although I still do not know what the problem
> > > was. Maybe. it works now because I reinstalled java. I will also send
> > > you the error log, in case you are interested.
>
> > > Thanks a lot,
>
> > > Christian
>
> > > On 21 Mai, 00:20, Holger Knublauch <[email protected]> wrote:
> > >> On May 21, 2010, at 1:08 AM, Christian Fuerber wrote:
>
> > >>> Hi Holger,
>
> > >>> thank you for the quick response. Yes, I could successfully connect to
> > >>> a graph in virtuoso that has 245 triples. But the SPIN constraint
> > >>> checks are not working on the graph's data. I receive an error "Could
> > >>> not run checker" when executing "Refresh and show problems
> > >>> (constraints)". SPARQL in TBC is also not working. I just can see the
> > >>> classes and instances in the editor.
>
> > >> Are there any more details available, e.g. the Error Log?
>
> > >> And yes, TDB will almost certainly be faster for SPARQL, because it will 
> > >> "live" in the same JVM, so no communication overhead is needed. 
> > >> Furthermore, TDB is better optimized to work with the ARQ SPARQL engine.
>
> > >> Thanks
> > >> Holger
>
> > >> --
> > >> You received this message because you are subscribed to the Google
> > >> Group "TopBraid Suite Users", the topics of which include TopBraid 
> > >> Composer,
> > >> TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> > >> To post to this group, send email to
> > >> [email protected]
> > >> To unsubscribe from this group, send email to
> > >> [email protected]
> > >> For more options, visit this group 
> > >> athttp://groups.google.com/group/topbraid-users?hl=en
>
> > > --
> > > You received this message because you are subscribed to the Google
> > > Group "TopBraid Suite Users", the topics of which include TopBraid 
> > > Composer,
> > > TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> > > To post to this group, send email to
> > > [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/topbraid-users?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
> > TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
> > To post to this group, send email to
> > [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group 
> > athttp://groups.google.com/group/topbraid-users?hl=en

-- 
You received this message because you are subscribed to the Google
Group "TopBraid Suite Users", the topics of which include TopBraid Composer,
TopBraid Live, TopBraid Ensemble, SPARQLMotion and SPIN.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/topbraid-users?hl=en

Reply via email to