Just thought I'd chime in.  One of my experiences in doing the sorting work
in Xalan-C was, that for large data sets, the cost of caching could start
to overwhelm the comparison cost.  As a result, I left some conditional
code in that disabled caching, which helped for large documents where the
sort key values were large.

Dave



                                                                                
                                         
                      Scott                                                     
                                         
                      Boag/Cambridge/I         To:      Martin Dengler <[EMAIL 
PROTECTED]>                                
                      BM                       cc:      [EMAIL PROTECTED], 
(bcc: David N                      
                      <[EMAIL PROTECTED]         Bertoni/Cambridge/IBM)         
                                           
                      bm.com>                  Subject: Re: Xalan sort 
algorithm performance?                            
                                                                                
                                         
                      03/21/2003 12:10                                          
                                         
                      PM                                                        
                                         
                                                                                
                                         




It's a mergesort (see Robert Sedgewick's Algorithms book).  An issue may be
the match-and-select it has to do, though these should be cached so that
once the match-and-select is done, only the comparison need be made.  At
one time I am under the impression that our sorting was comparatively
pretty good.  But it's been I while since I've been in that code.

You might want to root around in the
org.apache.xalan.transformer.NodeSorter code, or do a line profile of that
class.

-scott

Martin Dengler <[EMAIL PROTECTED]> wrote on 03/20/2003 04:17:27 PM:

> Hi,
>
>    Please yell if this is an FAQ or my googling skills need help.
>
>    What sorting algorithm does Xalan use -- and what is the algorithm(s)
> runtime and memory footprint in big-O notation?
>
>    I am using Xalan to sort large XML documents: approx 3 levels deep of
> a tree with 2-20,000 first child (attached to root) nodes and 30
> grandchildren per child node.  Sorting is done on an attribute of the
> grandchild nodes.  The performance -- especially in memory usage -- is
> abysmal: 10-60 seconds sort time, memory usage is at least O(n^2).
>
>    I've read the w3c performance pointers, but cutting the document down
> and doing fewer transforms really doesn't cut it (neither are applicable
> to a one-pass sort of a large document).
>
>    Are there any ways to change the sort algortihm's memory/runtime
> behavior?
>
> Thanks,
> Martin
>
> PS -- test specs: Xerces, Xalan (latest), Solaris 2.8, JVM 1.2.2_07, Sun
> E450, 2GB real mem, 1GB JVM heap, 2x450mhz processors.
> [attachment "signature.asc" deleted by Scott Boag/Cambridge/IBM]

Reply via email to