Hi, > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Frans Englich > > On Monday 22 May 2006 10:54, Buchcik, Kasimier wrote: > [...] > > > The tiny "[1]) = 1" is the part > > of the expression, which e.g. Saxon can use to optimize the > evaluation. > > With 1732 nodes in the key, Saxon took ~12s on a test box, > while Libxslt > > took ~50s for evaluation of this expression. On the other hand, if I > > changed this expression to use "count(.....) > 1" (i.e. without the > > "[1]") > > then Saxon needed already ages to process 300 nodes, while > Libxslt times > > were still linear. So Saxon is optimized, if "[1]" is used; > > Libxslt/Libxml2 > > not. > > What kind of optimizations does libxml2 do on predicated, in general? > > For the "[1]" predicate, I think two types can be done(when > we're talking about implementations like Saxon and libxml2): > > * Analysis of context dependency, and as result conclude how > the predicate must be evaluated. For example, "1" doesn't depend on the > context and will therefore always evaluate to the same. However, an expression > involving "." would have to be evaluated each time. End point, some > predicates only have to be evaluated once. > > * Early exit. For example, one knows that "1" will never > match beyond the first item. > > Saxon implements both.
The following might be a bit vague, since I only scratched Libxml2's XPath code. I think Libxml2 does only the "early exit" optimization; at least I haven't found code indicating the opposite. Although in the case of [n], it doesn't evaluate "n", but just triggers execution of optimized evaluation functions wrt the position. The 3 functions are: 1) xmlXPathCompOpEvalFirst() - used for "[1]" 2) xmlXPathCompOpEvalLast() - used for "last()" 3) xmlXPathNodeCollectAndTestNth() - used for "[n]" But there's one catch for the "early exit": In Libxml2 each predicate is evaluated on the node-set result of its preceding predicate; so the "early exit" is only possible at the last level of predicates. Example: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:key name="mykey" match="//foo" use="@val"/> <xsl:template match="/"> <xsl:value-of select="count(key('mykey', 'a')[EMAIL PROTECTED]@zoo][1])"/> </xsl:template> </xsl:stylesheet> <?xml version="1.0"?> <foo val="a" boo="x" zoo="x"> <foo val="a" boo="x" zoo="x"/> </foo> Theoretically the evaluation could stop on the first "foo" element in doc-order, but it does not: first the predicate "@boo" is applied to both "foo" elements, which results in a node-set containing both elements. Then the predicate "@zoo" is applied to the node-set, and only here we can "exit early" on the first match. I know this goes quite hand in hand with what the XPath says, but I think we loose opportunities for optimization here. I think the result would be same same, if the predicates were evaluated all in a row on each node of the initial node-set. If such a mechanism would be more complex than the one existing in an other question. It would be interesting to hear how Saxon has implemented this. Regards, Kasimier _______________________________________________ xslt mailing list, project page http://xmlsoft.org/XSLT/ [email protected] http://mail.gnome.org/mailman/listinfo/xslt
