Re: Repost: Xerces XML performance problems

Nath Wed, 26 May 2004 07:25:31 -0700

Sure thing,

// Member variables


XercesDOMParser *cXMLParser;

XERCES_CPP_NAMESPACE_QUALIFIER DOMDocument *cXMLDoc;

DOMNodeList *cXMLNodeList,

                        *cChildNodeList;

DOMNode *cXMLNode;

DOMNamedNodeMap *cXMLNamedNode;






// Initialization

XMLPlatformUtils::Initialize();

cXMLParser = new XercesDOMParser();

cXMLParser->setValidationScheme(XercesDOMParser::Val_Never);

cXMLParser->setLoadExternalDTD(false);






// Main code

cXMLParser->parse(filename);

cXMLDoc = cXMLParser->getDocument();



// Get word nodes

cXMLNodeList = cXMLDoc->getElementsByTagName(XMLString::transcode("word"));



// Loop through all word nodes

for (int i = 0; i < cXMLNodeList->getLength(); i++)

{

   // Obtain list of child nodes

   cChildNodeList = cXMLNodeList->item(i)->getChildNodes();



   // Loop through all child nodes

   for (int j = 0; j < cChildNodeList->getLength(); j++)

   {

      strcpy(name,
XMLString::transcode(cChildNodeList->item(j)->getTextContent());

      // . . . . definitions and whatnot are also copied here

   }

}





----- Original Message ----- 
From: "Erik Rydgren" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; "'Nath'" <[EMAIL PROTECTED]>
Sent: Wednesday, May 26, 2004 2:29 AM
Subject: RE: Repost: Xerces XML performance problems


> Can you please provide a snippet of your DOM tree traversing code. It is
> hard to see what the problem is if we do not know what you are doing.
>
> Regards
> Erik
>
> > -----Original Message-----
> > From: Nath [mailto:[EMAIL PROTECTED]
> > Sent: den 25 maj 2004 19:07
> > To: [EMAIL PROTECTED]
> > Subject: Repost: Xerces XML performance problems
> >
> > I had a mix-up in mailing lists, so I'm reposting my question here
> (with
> > some amendments to make it clearer) for any assistance.
> >
> >
> >
> >
> > I converted over a dictionary of words and definitions into XML files
> (one
> > file per letter of the alphabet), each weighing around 1-5 megs (I
> chose
> > XML
> > for storage and extensibility reasons). I'm trying to access node
> > information from these files and it's taking an incredible amount of
> time
> > to
> > do it. When acquiring node information from small files (letters X, Y,
> and
> > Z - a total of 815 words or 151 KB) the DOM document returns results
> > somewhat quickly and I can process the entire tree in less than 2
> seconds.
> > When parsing the letter A file (11,000 some words or 1.58 megs), it
> takes
> > 5
> > seconds just to process 20 word nodes (see below for a typical word
> node).
> > It seems the larger the XML file (ie: the more nodes within), the
> longer
> > it
> > takes to process all the nodes. Granted there's obviously going to be
> more
> > time involved, but between the 2 files I've tested, there doesn't seem
> to
> > be
> > a linear process-time relationship. Can anyone suggest why this is
> > happening
> > and how I can fix it? I've used xerces c++ 2.4.0 and recently upgraded
> to
> > xerces c++ 2.5.0.
> >
> >
> > I'm just following the standard XML start-up and DOM parsing procedure
> > - Initialize platform utils
> > - Don't validate files
> > - parse and assign DOM document (fast)
> > - go through each child node and collect data (slow)
> >
> >
> >
> > The dictionary format is simply:
> >
> > <dictionary>
> >
> > <word>
> >
> > <name>whatever</name>
> >
> > <def> 1 </def>
> >
> > <def> 2 </def>
> >
> >
> >
> > </word>
> >
> >
> >
> > </dictionary>
> >
> > I have a 1600MHz processor, so handling a few meg files should be
> fairly
> > quick. I've also tried parsing the file with SAX, albeit the
> performance
> > is
> > a tad better, the end result is still a lengthy wait.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Repost: Xerces XML performance problems

Reply via email to