Repost: Xerces XML performance problems

Nath Tue, 25 May 2004 10:06:59 -0700

I had a mix-up in mailing lists, so I'm reposting my question here (with
some amendments to make it clearer) for any assistance.





I converted over a dictionary of words and definitions into XML files (one
file per letter of the alphabet), each weighing around 1-5 megs (I chose XML
for storage and extensibility reasons). I'm trying to access node
information from these files and it's taking an incredible amount of time to
do it. When acquiring node information from small files (letters X, Y, and
Z - a total of 815 words or 151 KB) the DOM document returns results
somewhat quickly and I can process the entire tree in less than 2 seconds.
When parsing the letter A file (11,000 some words or 1.58 megs), it takes 5
seconds just to process 20 word nodes (see below for a typical word node).
It seems the larger the XML file (ie: the more nodes within), the longer it
takes to process all the nodes. Granted there's obviously going to be more
time involved, but between the 2 files I've tested, there doesn't seem to be
a linear process-time relationship. Can anyone suggest why this is happening
and how I can fix it? I've used xerces c++ 2.4.0 and recently upgraded to
xerces c++ 2.5.0.


I'm just following the standard XML start-up and DOM parsing procedure
- Initialize platform utils
- Don't validate files
- parse and assign DOM document (fast)
- go through each child node and collect data (slow)



The dictionary format is simply:

<dictionary>

<word>

<name>whatever</name>

<def> 1 </def>

<def> 2 </def>



</word>



</dictionary>

I have a 1600MHz processor, so handling a few meg files should be fairly
quick. I've also tried parsing the file with SAX, albeit the performance is
a tad better, the end result is still a lengthy wait.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Repost: Xerces XML performance problems

Reply via email to