RE: [xslt] RE: [xml] does xsltproc caches subexpressions

Buchcik, Kasimier Mon, 22 May 2006 10:17:12 -0700

Hi, 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> On Behalf Of Frans Englich
> 
> On Monday 22 May 2006 10:54, Buchcik, Kasimier wrote:
> [...]
> 
> > The tiny "[1]) = 1" is the part 
> > of the expression, which e.g. Saxon can use to optimize the 
> evaluation.
> > With 1732 nodes in the key, Saxon took ~12s on a test box, 
> while Libxslt
> > took ~50s for evaluation of this expression. On the other hand, if I
> > changed this expression to use "count(.....) > 1" (i.e. without the
> > "[1]")
> > then Saxon needed already ages to process 300 nodes, while 
> Libxslt times
> > were still linear. So Saxon is optimized, if "[1]" is used;
> > Libxslt/Libxml2
> > not.
> 
> What kind of optimizations does libxml2 do on predicated, in general?
> 
> For the "[1]" predicate, I think two types can be done(when 
> we're talking about implementations like Saxon and libxml2):
> 
> * Analysis of context dependency, and as result conclude how 
> the predicate must be evaluated. For example, "1" doesn't depend on
the 
> context and will therefore always evaluate to the same. However, an
expression 
> involving "." would have to be evaluated each time. End point, some 
> predicates only have to be evaluated once.
> 
> * Early exit. For example, one knows that "1" will never 
> match beyond the first item.
> 
> Saxon implements both.


The following might be a bit vague, since I only scratched Libxml2's
XPath
code.
I think Libxml2 does only the "early exit" optimization; at least I
haven't
found code indicating the opposite. Although in the case of [n], it
doesn't
evaluate "n", but just triggers execution of optimized evaluation
functions
wrt the position. The 3 functions are:
1) xmlXPathCompOpEvalFirst()       - used for "[1]"
2) xmlXPathCompOpEvalLast()        - used for "last()"
3) xmlXPathNodeCollectAndTestNth() - used for "[n]"

But there's one catch for the "early exit": In Libxml2 each predicate is
evaluated on the node-set result of its preceding predicate; so the
"early exit" is only possible at the last level of predicates.

Example:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  <xsl:key name="mykey" match="//foo" use="@val"/>
  <xsl:template match="/">    
          <xsl:value-of select="count(key('mykey',
'a')[EMAIL PROTECTED]@zoo][1])"/>
  </xsl:template>
</xsl:stylesheet>

<?xml version="1.0"?>
<foo val="a" boo="x" zoo="x">
  <foo val="a" boo="x" zoo="x"/>
</foo>

Theoretically the evaluation could stop on the first "foo" element in
doc-order, but it does not: first the predicate "@boo" is applied to
both "foo" elements, which results in a node-set containing both
elements.
Then the predicate "@zoo" is applied to the node-set, and only here we
can
"exit early" on the first match. I know this goes quite hand in hand
with
what the XPath says, but I think we loose opportunities for optimization
here.

I think the result would be same same, if the predicates were evaluated
all
in a row on each node of the initial node-set. If such a mechanism would
be more complex than the one existing in an other question.

It would be interesting to hear how Saxon has implemented this.

Regards,

Kasimier

_______________________________________________
xslt mailing list, project page http://xmlsoft.org/XSLT/
[email protected]
http://mail.gnome.org/mailman/listinfo/xslt

RE: [xslt] RE: [xml] does xsltproc caches subexpressions

Reply via email to