On 5/15/2013 6:25 PM, Michael Kay wrote:
[about the optimizer that it's]
making pure guesses based on observed behaviour rather than hard data
- and by doing so, is reinforcing that behaviour. It's a black art.)
This is very insightful. We tend to think of the optimizer as
"go-faster sauce," and often underestimate the impact that optimizers
have, or should have, on program design, when performance is critical.
A familiar (to me) example of this is which indexes get built in
persistent data stores. MarkLogic, eg, builds automatic indexes on all
element names + words/values, and on all element/attribute name pairs +
words/values, and these enable all kinds of optimizations. But they
aren't always the best choices. One thing we've had to grapple with is
customers who use a particular attribute (id comes to mind) that can
appear on any element, and be a target of cross-references. In that
case, we'd really want an index on all attributes named "id", regardless
of the element name they're attached to. The ML indexes really do
enforce a particular style of markup (if you want good performance
easily). As another example, we tend to advise ML customers against
having the same-named element in different contexts since they aren't as
easily indexed. I don't mean to beat up on ML here which now offers
XPath-based indexes, just like eXist! this is more in the way of
illustrating a broader point:
I wonder how much schema design has been / will be influenced by the
availability of various optimizations (and indexing options) in such
systems, and to what extent these schemas will be more or less tuned to
the indexing options available on the platform where they were first
used. Has there ever been any sort of attempt to study which kinds of
indexes are most effective across some wide swath of use cases? I can't
imagine how one would gather enough meaningful cases for that, so
perhaps its a mere pipe dream. By the same token, has there been any
attempt to standardize the specification of XML indexes, as we have for
SQL indexes? I guess we have the example of xsl:key -- that's really
the only standard I know of.
To echo what Daniela said in an earlier message in this thread, I think
the key to helping users work with optimizers is to make it apparent to
the user what optimizations are being performed (if they ask), so they
can tell whether the optimizer is working for or against them, and to
provide tools for the user to specify particular optimizations, or to
constrain the optimizer, at least in critical decisions. There are
probably too many details to expose everything, but in particular in the
case of indexing optimizations, the correct (or incorrect) choice can
have such an overwhelming effect on performance that it is really
important to give the user the ability to understand and control the
execution plan.
Query plans can often be opaque and difficult for all but the most
expert users to understand, though. This has historically been true for
SQL query plans, as well, although I think visualization tools can
sometimes help. I like the approach of expressing all query
optimizations as built-in functions. In this way, an optimized query is
just another query in the same language the user is familiar with,
albeit with some special-purpose functions they have to learn in order
to understand what the optimizations are.
-Mike
_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk