Just thought I'd chime in. One of my experiences in doing the sorting work
in Xalan-C was, that for large data sets, the cost of caching could start
to overwhelm the comparison cost. As a result, I left some conditional
code in that disabled caching, which helped for large documents where the
sort key values were large.
Dave
Scott
Boag/Cambridge/I To: Martin Dengler <[EMAIL
PROTECTED]>
BM cc: [EMAIL PROTECTED],
(bcc: David N
<[EMAIL PROTECTED] Bertoni/Cambridge/IBM)
bm.com> Subject: Re: Xalan sort
algorithm performance?
03/21/2003 12:10
PM
It's a mergesort (see Robert Sedgewick's Algorithms book). An issue may be
the match-and-select it has to do, though these should be cached so that
once the match-and-select is done, only the comparison need be made. At
one time I am under the impression that our sorting was comparatively
pretty good. But it's been I while since I've been in that code.
You might want to root around in the
org.apache.xalan.transformer.NodeSorter code, or do a line profile of that
class.
-scott
Martin Dengler <[EMAIL PROTECTED]> wrote on 03/20/2003 04:17:27 PM:
> Hi,
>
> Please yell if this is an FAQ or my googling skills need help.
>
> What sorting algorithm does Xalan use -- and what is the algorithm(s)
> runtime and memory footprint in big-O notation?
>
> I am using Xalan to sort large XML documents: approx 3 levels deep of
> a tree with 2-20,000 first child (attached to root) nodes and 30
> grandchildren per child node. Sorting is done on an attribute of the
> grandchild nodes. The performance -- especially in memory usage -- is
> abysmal: 10-60 seconds sort time, memory usage is at least O(n^2).
>
> I've read the w3c performance pointers, but cutting the document down
> and doing fewer transforms really doesn't cut it (neither are applicable
> to a one-pass sort of a large document).
>
> Are there any ways to change the sort algortihm's memory/runtime
> behavior?
>
> Thanks,
> Martin
>
> PS -- test specs: Xerces, Xalan (latest), Solaris 2.8, JVM 1.2.2_07, Sun
> E450, 2GB real mem, 1GB JVM heap, 2x450mhz processors.
> [attachment "signature.asc" deleted by Scott Boag/Cambridge/IBM]