RE: [xslt] RE: [xml] does xsltproc caches subexpressions

Buchcik, Kasimier Mon, 22 May 2006 11:16:13 -0700

Hi, 

Stefan, I committed the changes of xpath.c to CVS (revision 1.306).
The code contains the optimization for the template
"generate-english-index" as described in my previous mail (below).


Regards,

Kasimier

"generate-english-index"
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> On Behalf Of Buchcik, Kasimier
 
> Hi,
> 
> > -----Original Message-----
> > From: Rob Richards [mailto:[EMAIL PROTECTED] 
> 
> [...]
> 
> > > Currently I'm debugging and concentrating on the xsl:key issues
> > > you already found to be the bottleneck of the whole story.
> 
> First results: it's an XSLT but also an XPath issue.
> All the xsl:key were computed for all input documents at the
> beginning of the transformation process. This is actually not needed,
> since if a key() is never called on a document, then the key need
> not be computed for that document; there's an ugly exception to this:
> if a key() is used in a template's match pattern, then one doesn't
> really know if a key should be computed for a document or not.
> 
> The optimizations we applied are the following (already committed to
> CVS):
> 1) Don't compute all keys for all documents
> 2) Compute a key with a specific name not until it is called 
> by a key()
>    function
> 3) If there are "keyed" template match patterns, then compute all keys
>    for the current document if there's a chance that the current input
>    node might match such a "keyed" template. So this remains an ugly
>    case, but fortunately the docbook XSLs don't use this 
> nasty feature.
> 
> The previous behaviour used a lot of memory, since it computed all
> keys for all documents. This fact might be the reason, why it took
> ages for Ed Catmur 
> (http://bugzilla.gnome.org/show_bug.cgi?id=311857) to
> build the docs; Damon Chaplin already pointed out that he possibly ran
> out of RAM.
> The memory usage dropped from ~200 MB to ~110 MB in by win VMWare box
> after the optimization was applied.
> 
> Additionally there's an XPath issue. Stefan Kost and Damon Chaplin
> already identified the bottleneck to hang around in "autoidx.xsl".
> 
> Here in the template "generate-english-index" we have:
> <xsl:variable name="terms"
>   select="//indexterm[count(.|key('letter',
>     translate(substring(&primary;, \1, 1),
>     &lowercase;,
>     &uppercase;))[&scope;\][1]) = 1
>     and not(@class = 'endofrange')]"/>
> 
> This looks weird, but this was written by Jenni Tennison, so I assumed
> she knew what she did. And in fact, the tiny "[1]) = 1" is the part
> of the expression, which e.g. Saxon can use to optimize the 
> evaluation.
> With 1732 nodes in the key, Saxon took ~12s on a test box, 
> while Libxslt
> took ~50s for evaluation of this expression. On the other hand, if I
> changed this expression to use "count(.....) > 1" (i.e. without the
> "[1]")
> then Saxon needed already ages to process 300 nodes, while 
> Libxslt times
> were still linear. So Saxon is optimized, if "[1]" is used;
> Libxslt/Libxml2
> not.
> 
> I tried to apply such an optimization to the XPath code, and voila, it
> ran in <1s.
> I'm still a bit lost in the XPath machinery of Libxml2, and 
> not sure if
> the
> change is correct as it is, so I'll just IFDEF out the part when I'll
> commit.
> This is work-in-progess; and when Daniel's back and finds time to
> look at it, we'll have a nice optimization.
> 
> The next bottleneck in the row is the template "indexterm" (mode =
> "reference")
> in "autoidx.xsl":
> 
> number               match                name      mode  Calls Tot
> 100us Avg
> 
>     0            indexterm                     reference   
> 2464 13039183
> 5291
>     1                         gentext.template            
> 16047 2219472
> 138
>     2                        user.head.content               
> 53 2216486
> 41820
>     3                                    chunk           
> 191008 1984551
> 10
>     4                    *                    recursive-chunk-filename
>                                                           92686 799234
> 8
> 
> 
> 
> Regards,
> 
> Kasimier
> 
> 
> _______________________________________________
> xslt mailing list, project page http://xmlsoft.org/XSLT/
> [email protected]
> http://mail.gnome.org/mailman/listinfo/xslt
> 
> 
_______________________________________________
xslt mailing list, project page http://xmlsoft.org/XSLT/
[email protected]
http://mail.gnome.org/mailman/listinfo/xslt

RE: [xslt] RE: [xml] does xsltproc caches subexpressions

Reply via email to